Dissection of a Nuclear Localization Signal*

The regulated process of protein import into the nucleus of a eukaryotic cell is mediated by specific nuclear localization signals (NLSs) that are recognized by protein import receptors. This study seeks to decipher the energetic details of NLS recognition by the receptor importin a through quantitative analysis of variant NLSs. The relative importance of each residue in two monopartite NLS sequences was determined using an alanine scanning approach. These measurements yield an energetic definition of a monopartite NLS sequence where a required lysine residue is followed by two other basic residues in the sequence K(K/R) X (K/R). In addition, the energetic contributions of the second basic cluster in a bipartite NLS ( ; 3 kcal/mol) as well as the energy of inhibition of the importin a importin b -bind-ing domain ( ; 3 kcal/mol) were also measured. These data allow the generation of an energetic scale of nuclear localization sequences based on a peptide’s affinity for the importin a -importin b complex. On this scale, a functional NLS has a binding constant of ; 10 n M , whereas a nonfunctional NLS has a 100-fold weaker affinity of 1 m M . Further correlation between the current in vitro data and in vivo function will provide the foundation for a comprehensive quantitative model of protein import.

The regulated process of protein import into the nucleus of a eukaryotic cell is mediated by specific nuclear localization signals (NLSs) that are recognized by protein import receptors. This study seeks to decipher the energetic details of NLS recognition by the receptor importin ␣ through quantitative analysis of variant NLSs. The relative importance of each residue in two monopartite NLS sequences was determined using an alanine scanning approach. These measurements yield an energetic definition of a monopartite NLS sequence where a required lysine residue is followed by two other basic residues in the sequence K(K/R)X(K/R). In addition, the energetic contributions of the second basic cluster in a bipartite NLS (ϳ3 kcal/mol) as well as the energy of inhibition of the importin ␣ importin ␤-binding domain (ϳ3 kcal/mol) were also measured. These data allow the generation of an energetic scale of nuclear localization sequences based on a peptide's affinity for the importin ␣-importin ␤ complex. On this scale, a functional NLS has a binding constant of ϳ10 nM, whereas a nonfunctional NLS has a 100-fold weaker affinity of 1 M. Further correlation between the current in vitro data and in vivo function will provide the foundation for a comprehensive quantitative model of protein import.
The sequestering of genetic material in the nucleus by eukaryotic cells provides a powerful mechanism for the regulation of gene expression and other cellular processes through the selective translocation of proteins between the nucleus and the cytoplasm (1)(2)(3). Recently, the regulated transport of proteins across the nuclear envelope has been recognized as a crucial step in an increasing number of cellular processes (4 -6). Understanding the mechanisms of regulated protein translocation through nuclear pores requires a detailed definition of the signals that mark a macromolecular complex for nuclear import or export.
The best characterized mechanism for translocation across the nuclear envelope is protein import which depends on the "classical" nuclear localization signal (NLS) 1 (7). This NLS consists of a cluster of basic residues (monopartite) or two clusters of basic residues separated by 10 -12 residues (bipartite) (8,9). This signal is recognized by the heterodimeric im-port receptor complex comprising importin ␣ and importin ␤ (3). Importin ␣ is an adapter protein that consists of a small N-terminal importin ␤-binding (IBB) domain and a larger Cterminal NLS-binding domain (10 -14). Importin ␤ does not directly interact with the NLS cargo but acts to direct importin ␣ to the nuclear pore (15,16). In the absence of importin ␤, "NLS-like" sequences of the N-terminal IBB domain form an intramolecular bond with the NLS-binding site inhibiting the interaction between importin ␣ and the NLS cargo. Evidence for this auto-inhibition is found in the crystal structure of full-length importin ␣ as well as in vitro binding assays (16 -19). Thus, the interaction between importin ␣ and the NLS cargo is regulated by importin ␤. In an analogous manner, the interaction between importin ␣ and importin ␤ is regulated by the small GTPase Ran. Both structural and biochemical evidence indicate that in the GTP-bound state, Ran binds tightly to importin ␤, resulting in a conformational change that triggers the release of importin ␣ (20 -23).
This cascade of regulated interactions suggests a molecular model for unidirectional import of proteins into the nucleus (2,3,24). The small GTPase Ran serves to distinguish the nucleus from the cytoplasm (23). In the nucleus, Ran is found primarily in a GTP-bound form, whereas, in the cytoplasm, the GDPbound form is dominant (3). Thus, in the cytoplasm, importin ␤ is free to bind to importin ␣ removing the inhibition of NLS binding by the IBB domain. The importin ␤-bound importin ␣ is then free to bind to the NLS cargo, and the ternary complex is translocated through the nuclear pore via interactions between importin ␤ and the nucleoporins (15,16). In the nucleus, the ternary complex encounters Ran in the GTP-bound state. Ran-GTP binds to importin ␤ releasing the importin ␣-NLS complex (19). The IBB domain of importin ␣ then competes with the NLS cargo for the NLS-binding site facilitating the release of the NLS cargo into the nucleus. This model for protein import requires precise tuning of the thermodynamic interactions between the various species for the reaction to proceed efficiently in a single direction. For example, the interaction between the NLS and importin ␣ must be tuned such that the affinity of the importin ␣-importin ␤ complex for the NLS is tight enough to allow cytoplasmic capture and nuclear translocation of the cargo. However, the interaction between the NLS and importin ␣ alone (in competition with the IBB domain) must be weak enough to allow efficient release of the NLS cargo into the nucleus. These restrictions yield a simple thermodynamic definition of a classical nuclear localization signal.
Presently, the definition of a nuclear localization signal sequence is somewhat vague owing to the diversity of sequences that can apparently act as a functional NLS (7). The NLS of the SV40 large T antigen, with a sequence of PKKKRKV, provides the prototypical monopartite NLS defined by a cluster of basic residues. Functional assays indicate that a lysine is essential in the third position of this sequence, but the importance of the other residues is somewhat ambiguous (8,25). The NLS from the c-myc proto-oncogene, with a sequence of PAAKRVKLD, illustrates the diversity of peptide sequences that can act as functional localization signals (26). In analogy with the third lysine of the SV40 NLS, the fourth lysine was found to be critical in the Myc NLS. In addition, the initial proline and the final Leu-Asp dipeptide were also found to be important in the function of this NLS sequence (27). The crystal structures of importin ␣ in complex with both the SV40 and the Myc NLS peptides have been reported recently (12,28). These crystal structures showed that the similar but distinct NLS sequences bound to the same site on importin ␣ in an extended conformation. As shown in Fig. 1, the SV40 sequence PKKKRKV bound in a nearly identical conformation to the Myc sequence of PAAKRVKLD with equivalent residues from each binding in identical pockets on the protein (see Table I).
In correlation with the functional data, the interactions in pocket 1 are extensive including three hydrogen bonds, a salt bridge, and hydrophobic interactions with the aliphatic segment of the lysine side chain. This is consistent with the hypothesis that the amino acid specificities of the NLS binding site of importin ␣ define the sequence requirements for a functional nuclear localization signal.
A common variant of the classical NLS is a bipartite sequence with a small cluster of basic residues positioned 10 -12 residues N-terminal to a monopartite-like sequence. The pro-totypal bipartite NLS is found in nucleoplasmin with the sequence KRPAATKKAGQAKKKKL (9). The additional binding energy contributed by the upstream cluster of basic residues relaxes the requirements for the downstream monopartite-like sequences. In fact, a nonfunctional SV40 variant where the critical third lysine residue is replaced with a threonine (PK-TKRKV) can be converted into a functional NLS through the addition of a second properly positioned basic cluster (KRTADSQHSTPPKTKRKV) (27).
A complete understanding of nuclear import signals requires a quantitative model for the import reaction that correlates NLS amino acid sequence, in vitro interaction energies, and in vivo functionality. The first steps toward such a model were taken with the report of a quantitative assay for the affinity between importin ␣ and an NLS sequence using an enzymelinked immunosorbent assay-based method (29 -32). We have recently expanded on these efforts by reporting a fluorescencebased assay for NLS-importin ␣ interactions that is performed in solution at equilibrium (17). With this fluorescence assay, we have begun to reconstitute the molecular reactions of protein import in vitro to provide a detailed thermodynamic description of the translocation reaction. A detailed description of the energetic requirements for an NLS sequence will facilitate the recognition of these sequences in protein primary structure as well as suggest possible modes for the regulation of protein import.
Here we attempt to decipher the energetic details of NLS recognition by importin ␣ through quantitative analysis of variant NLS affinities. The relative importance of each residue in two monopartite NLS sequences was determined using an alanine scanning approach. These measurements yield an energetic definition of a monopartite NLS sequence. In addition, the energetic contributions of the second basic cluster in a bipartite NLS as well as the energy of inhibition of the importin ␣ IBB domain were also measured. These data allow the generation of an energetic scale of nuclear localization sequences and provide the foundation for a comprehensive quantitative model of protein import.

Generation of NLS-GFP Variants and Protein Purification-Various
NLS sequences were cloned as in-frame N-terminal fusions to the green fluorescent protein (GFP) through polymerase chain reaction in a pET-28a (Novagen) expression vector. The amino acid sequences of the NLS variants are enumerated in Table II. The identity of each variant was confirmed by DNA sequencing. Each of these variants was expressed and purified as described elsewhere (17). Both full-length importin ␣ and a fragment consisting of residues 89 -530 (⌬IBB importin ␣) were expressed and purified as described (17).
NLS Binding Assay-The dissociation constant for the binding between NLS-GFP fusion proteins and importin ␣ was measured through a fluorescence depolarization assay (17). The anisotropy of the GFP fluorescence was monitored using an ISS PC1 fluorimeter with the sample maintained at 25°C. The sample was excited at a wavelength of 492 nm, and the emitted fluorescence was measured after filtering through a 510-nm high pass filter. The changes in the anisotropy of NLS-GFP when titrated with various concentrations of importin ␣ were then used to calculate the fraction of NLS-GFP bound yielding a binding isotherm for the reaction. The binding isotherm was then fit through nonlinear regression to a simple binding equation.  Table I (note that pocket 3 is obscured by a loop of importin ␣ in A). In the NLS peptides, carbon atoms are shown in white and oxygen and nitrogen atoms are drawn in red and blue, respectively. The surface of importin ␣ is rendered according to the electrostatic potential, with red denoting a negative charge and blue denoting positive. The NLS binds in a predominantly acidic groove of importin ␣. The rendering was made using the program GRASP (36). SV40 PKKKRKV Myc PAAKRVKLD a The position of each residue when bound to importin ␣ as shown in Fig. 1.
Y ϭ fraction bound, K d ϭ dissociation constant, L t ϭ total ligand concentration (importin ␣), and R t ϭ total receptor concentration (NLS-GFP). Values for K d were then derived from the fitted parameters. Each K d value was calculated from two to four titrations with 18 -25 data points in each titration. For each data point, three measurements of the anisotropy were made. By combining the data from repeated titrations, uncertainties were estimated for each anisotropy value. Confidence limits for the fitted K d values were then estimated by repeated fitting of the experimental data after applying random normally distributed shifts to each anisotropy value (33). Errors in K d are reported at a 95% confidence limit. Data interpretation was performed using scripts for the program Mathematica (Wolfram Research). Example scripts are available from the authors upon request.
End Point Titrations-To define the relative functional stoichiometry of importin ␣ and ⌬IBB importin ␣, an end point titration was performed using the tight-binding BPSV40-GFP fusion protein (see Table II) as a probe. The titration was performed in the presence of 3 M BPSV40-GFP, yielding a nearly linear relationship between the fraction of NLS bound and the concentration of both importin ␣ proteins. For each protein, the data was fit to two lines, one for the initial linear region and one for saturation. The intercept of these two lines defines the molar equivalent of each importin ␣ protein to the BPSV40-GFP probe.
Peptide Inhibition Assay-To determine the affinity of ⌬IBB importin ␣ for a small peptide NLS, the inhibition constant of this peptide in the binding of SV40-GFP to ⌬IBB importin ␣ was measured. The binding curve for SV40-GFP was measured in the presence of four different concentrations of the SV40A5 peptide ranging between 5 and 100 M. The resulting binding curves were then simultaneously fit to an equation for the fraction NLS-GFP bound, Y, as a function of K d for SV40-NLS, K i for the peptide, the total NLS-GFP concentration, the total importin ␣ concentration, and the total concentration of the peptide. There are three solutions to this equation, two of which are physically useful, that correspond to the situations where K d Ͼ K i and where K i Ͼ K d . Using the latter case, the binding affinity between the SV40A5 peptide and ⌬IBB importin ␣ was calculated (data not shown).
Theoretical Energy Estimation-Using the published three-dimensional structures of the SV40 NLS and the Myc NLS in complex with ⌬IBB importin ␣ (12, 28), a theoretical estimation of the relative free energy contribution of each residue of each NLS in the binding reaction with importin ␣ was calculated. To perform a rigorous, atomic-level free energy simulation with this system would require an enormous effort, and such an effort would not yield much higher precision or accuracy than simpler, approximate methods. Thus, the relative free energy contributions reported here were generated using a number of reasonable methods based on approximations. In this calculation, we estimate the change in the binding energy, or ⌬⌬G, when a residue of the NLS is substituted with alanine. Thus, only terms that will differ between the binding of similar but distinct NLS variants need be considered. The free energy of binding between the NLS and importin ␣ can be expressed as a summation of several terms including: (i) hydrophobic entropy, (ii) van der Waals interactions, (iii) hydrogen bonds, (iv) electrostatic interactions, and (v) conformational free energy. Hydrophobic entropy and van der Waals interactions are roughly proportional to the change in buried surface area of both molecules upon forming a complex. Hydrogen bonds and electrostatic interactions can be estimated by determining the interactions between the electrostatic potential of the protein with the peptide. We assume that the conformational change required of importin ␣ in binding an NLS will be the same as that for binding its variants, i.e. we assume that importin ␣ binds to each alanine mutant of an NLS in an identical conformation. The merits of this assumption are discussed below under "Results and Discussion." Thus, the contribution of conformational free energy to ⌬⌬G derives from the different conformations that the variant NLSs adopt in the unbound state. To render this calculation feasible, we assume that the unbound NLS exists in two states: a random coil comprising multiple iso-energetic conformations, or an ␣ helix. As the NLS must adopt a nonhelical, extended conformation to bind to importin ␣, the relative helicity of each NLS variant will have a negative impact on the relative binding affinity for importin ␣.
To obtain a crude estimation of the theoretical contributions of various terms to the relative binding energies of variant NLS sequences, the relative changes in buried surface area, electrostatic interactions, and helicity for each NLS variant was calculated as follows.
Buried Surface Area-The surface area buried upon the binding of an NLS to importin ␣ was calculated from the crystal structures for the SV40-NLS-importin ␣ complex (Protein Data Bank code 1BK6) and the Myc-NLS-importin ␣ complex (Protein Data Bank code 1EE4). Areas were calculated with the CNS program using the standard protein parameters therein (35). To calculate the buried surface area, the NLS was first separated from importin ␣ and the surface area of each molecule was obtained. Next, the surface area of the complex was calculated and subtracted from the sum of the individual, unbound surface areas. Both the protein and peptide were maintained in the same conformation in both the bound and unbound state in this calculation. Although the conformations of both NLS and importin ␣ undoubtedly change upon forming a complex, we assume that these changes will not have a significant effect when comparing the buried surface area from one NLS variant to the next. For each NLS alanine variant, the calculation was repeated with the appropriate NLS side chain atoms omitted.
Electrostatic Energy-The change in electrostatic interaction between the NLS and importin ␣ with each alanine substitution was estimated first by calculating the electrostatic potential for the NLSimportin ␣ complex with the appropriate side chain atoms removed. The electrostatic potential was calculated using the Poisson-Boltzmann method in the program GRASP with partial atomic charges taken from the CNS protein parameter file (35,36). The electrostatic potential was calculated at the position of each atom from the removed side chain. Then this potential was multiplied by the partial charges of each of the side chain's atoms yielding the approximate energy of interaction between the side chain and the rest of the complex.
Helicity-The fractional helicity of each NLS variant sequence was calculated using the program AGADIR (37). The entire N-terminal sequence of each variant as shown in Table II was used as the input into this program and the helical content of each peptide sequence was obtained. We assume that the energy necessary to unravel the peptide into an extended conformation for binding to importin ␣ is proportional to the helical content of the peptide.
Deconvolution of Energetic Terms-The approximations involved with calculating these three energetic terms preclude the possibility of calculating theoretical free energies a priori. However, each of these terms gives a relative scale for the contribution of each term to the ⌬⌬G of binding of each variant NLS. The values obtained are proportional to ⌬⌬G and are comparable within each term, but the three energy terms calculated above cannot be directly compared with each other. These calculations can be used to determine the relative importance of each term in the binding of each residue of the NLS to importin ␣.
To determine the relative contributions of each term to NLS binding, a simple linear deconvolution was performed. For each of the two NLS sequences, the SV40 and the Myc NLSs, ⌬⌬G values were calculated for each alanine variant in comparison to the wild type sequence. Assuming that the three energy terms calculated above, buried surface, electrostatic interaction, and helicity, are proportional to ⌬⌬G and are internally consistent, then ⌬⌬G can be expressed as a linear function of these three terms. Suppose the vector ⌬⌬G ϭ (⌬⌬G 1 . . . ⌬⌬G i ), where ⌬⌬G i is the experimentally measured ⌬⌬G value for the NLS alanine mutant i. Then, given vectors for the calculated terms ⌬⌬(buried surface), ⌬⌬(electrostatics), and ⌬⌬(helicity), ⌬⌬G can be fit to Equation 2.
C surf , C elec , and C hel are scalar constants. If a reasonable fit is achieved, then the resulting terms C surf ⌬⌬(buried surface), C elec ⌬⌬(electrostatics), and C hel ⌬⌬(helicity) give the relative contributions of each energy term to the binding of each residue of the NLS to importin ␣.

Energetic Contributions of Each Residue in a Monopartite
NLS-Nuclear localization signals are an apparently diverse set of sequences with a generally polybasic character (7,8). To obtain a more detailed description of the essential features of a monopartite NLS, the energetic contributions of each residue in two classical NLS sequences (the SV40 NLS and the Myc NLS) to the binding of the NLS to yeast importin ␣ were measured through an alanine scan. To simplify the interpretation and collection of the data, the affinity of each NLS variant was measured for binding to an importin ␣ fragment (⌬IBB importin ␣) that lacks the auto-inhibitory N-terminal importin ␤ binding domain. This fragment is identical to the one that was crystallized in complex with both the SV40 and the Myc NLS peptides (12), facilitating direct comparison of the thermodynamic data to the atomic structure.
Each nonalanine residue of both the Myc and SV40 NLS sequences was mutated one at a time to alanine, and the affinity of each of these mutant NLS sequences for ⌬IBB importin ␣ was measured. Each NLS sequence was fused to the N terminus of GFP (see Table II). The binding of this NLS-GFP fusion to ⌬IBB importin ␣ was measured by monitoring the fluorescence depolarization of GFP while titrating the fused NLS with ⌬IBB importin ␣. The resulting binding curves were then fit to obtain a dissociation constant for the NLS-GFP/ ⌬IBB importin ␣ interaction (Fig. 2). The change in the affinity of each NLS variant is plotted as a ⌬⌬G value in Fig. 3, illustrating the energy profile for each NLS sequence.
As expected from previous functional and structural studies, a single lysine dominates the energetic profile of the monopartite NLS (8,12,25). This lysine corresponds to Lys-3 in the SV40 NLS and Lys-4 in the Myc NLS, where both residues bind to pocket 1 of the protein (numbered from Table I, see Fig. 1). Mutation of this residue in the SV40 NLS destroys its function as a nuclear localization signal (8). In this pocket, importin ␣ makes three hydrogen bonds with the NLS lysine N⑀, including one with the charged importin ␣ Asp-203 side chain. In addition, there are numerous contacts between the protein and the hydrophobic areas of the lysine side chain. To further characterize this site, three additional substitutions at this position were generated. The threonine substitution (SV40T3) is identical to the loss-of-function mutation tested in vivo (8). This mutant NLS bound with affinity nearly identical to that of the alanine mutation. A methionine substitution (SV40M3) was generated to test whether the hydrophobic surface of a long side chain could contribute a significant amount of binding energy. Surprisingly, this variant bound more weakly than the alanine and threonine variants. Finally, an arginine was substituted at this position (SV40R3) to test whether the site was specific for lysine or could accommodate either basic residue. Although this variant bound more tightly than the alanine variant, its affinity for ⌬IBB importin ␣ was significantly weaker than that of the wild type SV40 sequence. Thus, this pocket appears to be fairly specific for a lysine residue.
We measured a 3 M dissociation constant for the SV40A3 variant. Thus regardless of what conformation or register this variant adopts when binding to importin ␣, the tightest binding mode possible between the NLS and the receptor yields a binding constant of 3 M. As shown in Table III, if the SV40 variant was to shift the position of its binding by one residue in the N-terminal direction (register Ϫ1), the specific positions of lysines and alanines with regard to the five binding pockets of importin ␣ would be somewhat similar to the binding of the SV40A2 variant in the standard register. However, SV40A2 has the much tighter K d of 17 nM, 2 orders of magnitude tighter than the binding of the SV40A3 variant. The large difference in the binding affinity of SV40A3 in the Ϫ1 register compared with the SV40A2 variant suggests that at least one of the substitutions shown in the SV40A3, register Ϫ1 mode of binding costs at least 3 kcal/mol in binding energy. These substitutions include a Lys 3 Arg at position 2, a Lys 3 Val at position 4, and a Val 3 Glu at position 5. If these substitutions are less deleterious than the alanine substitution at position 1 (in the standard register), then the binding mode of the SV40A3 variant observed would be in the Ϫ1 register rather than the mode observed in the crystal structure. From the sequence of the Myc NLS, one can conclude that the Lys 3 Arg at position 2 can be accommodated. Thus the combination of Val at position 4 and Glu at position 5 must cost at least 3 kcal/mol. Similarly, one can compare SV40A3 in the ϩ1 register to SV40A4 and conclude that the combination of the Lys 3 Pro at the Ϫ1 position, the Arg 3 Lys at position 3, the Lys 3 Arg at position 4 and the Val 3 Lys at position 5 must cost at least 1.2 kcal/mol in binding energy. Thus, the register of binding the polybasic SV40 NLS is most probably set by the specificity of the Ϫ1 position, position 4, and/or position 5. Thus, the large variation in binding energy observed for sequential alanine substitutions in the NLS suggests that all the alanine variants bind in an identical conformation to that observed for the wild type sequences in the crystal structures.
An important conclusion drawn from these data is that the requirements for a monopartite NLS sequence are more specific than a simple cluster of basic residues. Our results are consistent with previous functional and structural studies, suggesting a basic core of the monopartite NLS with a sequence K(K/R)X(K/R) (8,12,25). The analysis of possible register shifts above suggests further requirements in the residues preceding the N-terminal anchoring lysine and/or at the C terminus of the basic core; however, the elucidation of the specific character of these terminal positions will require further study.
Modeling Energetic Terms-With the availability of the atomic structures of the NLS-importin ␣ complexes, it seemed a reasonable goal to attempt to correlate the experimental data with the crystal structure in a quantitative manner. With such an analysis, the energetic contributions of each residue might be further dissected into terms such as hydrophobic entropy and electrostatic interactions. As a rigorous free energy simulation with a system as large as the NLS-importin ␣ complex would be technically difficult, a more approximate approach was taken. The goal of this analysis was to calculate individual  terms for the ⌬⌬G values obtained in the alanine scanning data and then attempt to correlate these calculated energies with the experimentally measured quantities. Three energetic terms were considered. First, the changes in the surface area that was buried in the complex between the NLS and importin ␣ was calculated as a relative measure of van der Waals interactions along with entropic interactions with the solvent (including the hydrophobic effect). Second, the interaction of each side chain in the NLS with the surrounding electrostatic potential was calculated as a relative measure of the electrostatic and hydrogen bonding interactions between the peptide side chains and the surrounding protein. Finally, the helical contents of the various NLS sequences were calculated as a relative measure of the differences in the conformational energy of the unbound NLS sequence that must be overcome to bind to importin ␣. These three terms were then considered to be internally consistent and proportional to the free energy such that a linear combination of the three terms should be equivalent to the total ⌬⌬G values obtained by experimentation. As described under "Experimental Procedures," three scalar coefficients for the three terms were calculated by fitting the linear combination of these terms to the experimental data. The results of these calculations is illustrated in Fig. 4. Although the experimental data from the Myc NLS correlate well with the theoretical data, the SV40 calculations did not produce as good of a fit. When one set of coefficients was fit to all the data (both SV40 and Myc alanine scans), the calculated ⌬⌬G correlated fairly well with the experimental data (r ϭ 0.83 overall, r ϭ 0.97 Myc data alone); however, ⌬⌬G for most of the variants was underestimated (Fig. 4). This is primarily due to the overestimation of the theoretical terms describing the SV40A5 and SV40A2 variants. As shown in Fig. 4B, when the three terms are separated, it becomes apparent that the buried surface area due to the residues in the SV40A5 (Arg) and SV40A2 (Lys) variants is much more substantial in the crystal structure than expected from the experimental data, where both alanine substitutions had a very modest effect.
When the coefficients for the three energy terms were fit for just the Myc data alone, the correlation coefficient for these data increased to r ϭ 0.996. When these coefficients are applied to all the calculations, the resulting energies are shown in Fig.  4C. The Myc data is fit nearly perfectly, but again, the calculated energy for SV40A2 and SV40A5 were significantly overestimated. Although this overestimation in these residues cannot be presently explained, the fit obtained for the rest of the data (r ϭ 0.94 excluding SV40A2 and SV40A5) suggests that these calculations provide useful information.
When the data are fit using just the Myc alanine scan, the buried surface area yields the dominant energy term for the free energy profiles of the NLSs (Fig. 4D). For both the Myc and the SV40 NLS, the electrostatic term is comparable to the surface area term for the residues bound in pocket 1 (SV40A3 and MycA4). When each of these terms are individually fit to the experimental data, the surface area term yields a correlation coefficient of r ϭ 0.7, whereas the electrostatic term yields a correlation of r ϭ 0.73. When the fit is performed using a linear combination of the surface area and the electrostatic terms (omitting the helicity term), the correlation yields r ϭ 0.82 compared with r ϭ 0.83 when all three terms are used. The helicity term appears to be insignificant in comparison to the other two terms. When considering the Myc NLS data alone, the buried surface area is even more dominant. The fit of the surface area term alone to the experimental data yields an r ϭ 0.95, the electrostatic term alone yields r ϭ 0.85, and the linear combination of surface area with electrostatics yields an r ϭ 0.985 (compared with r ϭ 0.996 with all three terms). Thus, in a majority of our variants, the experimentally determined binding affinities correlate well with the amount of surface that is observed to be buried in the crystal structure of the NLSimportin ␣ complex.
The Additional Energy of a Bipartite Sequence-A majority of the putative NLS sequences recognized to date appear to be bipartite in structure (7). These sequences have a loose consensus of two basic clusters separated by 10 -12 residues. Crystallographic analysis suggests that importin ␣ binds these two basic clusters in two unique binding sites (28). The C-terminal cluster of the NLS binds to the monopartite-NLS binding site at the N terminus of the importin ␣ Armadillo domain. The N-terminal cluster of the bipartite NLS binds to a smaller site near the C terminus of importin ␣. It has been shown that adding a cluster of two basic amino acids 10 -12 residues upstream of a nonfunctional or very weak NLS (for example the SV40T3 variant) converts the sequence into a functional signal (27). We have analyzed this effect quantitatively using our FIG. 4. Theoretical contributions of energetic terms to the binding affinity between importin ␣ and an NLS. Using the crystal structures of the importin ␣-NLS complexes, the relative theoretical contributions of buried surface area, electrostatic interactions, and helicity to the ⌬⌬G values for each alanine scan mutant were calculated as outlined in the text. With the assumption that the experimental ⌬⌬G values should be a linear combination of the three terms, the three theoretical ⌬⌬G profiles for both NLS sequences were fit to the experimental data. The resulting theoretical and experimental ⌬⌬G values are compared in A. In C a similar fit was performed, but only the data from the Myc NLS was used in the fit. The coefficients from this fit were then used to calculate the theoretical values for both NLS sequences as shown. The contributions of each energetic term to the theoretical ⌬⌬G values shown in A and C are shown individually in B and D, respectively, yielding a relative importance of each term for each residue in the NLS. SV40T3 variant. As shown in Table II, when two basic residues are added at the appropriate distance N-terminal to the SV40T3 sequence, the binding affinity of this sequence for ⌬IBB importin ␣ increases from 3 M for the monopartite sequence to 13.5 nM for the bipartite variant (BPSV40T3). This further illustrates that there is a correlation between the ability of an NLS to function in vivo and the ability for the sequence to bind to importin ␣. This example also yields a measurement of the additional binding energy conferred by the addition of a second basic cluster to a monopartite NLS. The addition of two basic residues upstream of the monopartite sequences of the SV40T3, SV40A4, and SV40A6 variants contributed an average of ϳ3.1 kcal/mol to the binding affinity for ⌬IBB importin ␣. The addition of this second cluster to the SV40 and SV40A5 sequences produced NLS-GFP fusions whose affinity for ⌬IBB importin ␣ was too tight to measure accurately using our methods (K d Ͻ 0.5 nM). Thus, the inclusion of the upstream basic cluster dramatically increases the variety of sequences that can serve as a functional NLS.
Interestingly, the addition of the two basic residues to the GFP control protein yielded a fusion (BP-GFP) with measurable affinity for ⌬IBB importin ␣ (see Table II). This fusion binds to ⌬IBB importin ␣ with a K d of 2 M that is nearly identical to the binding constant for the SV40A3 NLS variant. There are two probable binding modes for the complex of the BP-GFP variant with ⌬IBB importin ␣. The first mode would have the newly added basic cluster (with the sequence KR) binding to the monopartite binding site of ⌬IBB importin ␣ with a lysine in the pocket 1 position. A second possible mode would make use of two arginines from the original GFP vector positioned 10 residues down from the newly added KR cluster. This second arrangement would suggest that the original GFP protein from our vector, as a model of a random peptide, binds to ⌬IBB importin ␣ ϳ3.1 kcal/mol more weakly than the BP-GFP variant. One can then conclude that a random peptide without a stable tertiary structure would bind to ⌬IBB importin ␣ with a K d of around 0.3 mM.
The crystal structure of a bipartite NLS bound to ⌬IBB importin ␣ revealed numerous sequence nonspecific interactions with the backbone of the polypeptide chain (28). To determine whether these interactions are important in the specific interaction of a monopartite NLS sequence with importin ␣, the affinity of ⌬IBB importin ␣ for an 11 residue peptide modeled after the SV40A5 sequence was determined. The binding isotherm for SV40-GFP and ⌬IBB importin ␣ was measured in the presence of various concentrations of the SV40A5 peptide. The K d for the interaction of the peptide with ⌬IBB importin ␣ was determined by a nonlinear least-squared fit of the various binding curves to a function derived by the simultaneous solution of the two independent binding equilibria. The fitted K d for the peptide was determined to be ϳ10 M compared with the SV40A5-GFP binding constant of 38 nM. The binding constant for the peptide alone is 250-fold weaker than that of the fusion protein. This discrepancy suggests that the interaction between the NLS and importin ␣ is dependent, in part, on flanking sequences. We believe that this different in binding energy (ϳ3.2 kcal/mol) is contributed through sequence nonspecific interactions between the residues Nterminal to the monopartite sequence and importin ␣ as observed in the atomic structure of a bipartite NLS bound to ⌬IBB importin ␣ (28).
Auto-inhibition of NLS Binding by the IBB Domain-The crystal structure of full-length murine importin ␣ revealed that the flexible, largely unstructured N-terminal importin ␤-binding domain contains NLS-like sequences that, in the absence of other proteins, will bind in the monopartite NLS binding site (18). We and others have previously shown that the inclusion of this domain inhibits the binding of monopartite NLS sequences to full-length importin ␣ and that this inhibition is relieved in the presence of importin ␤ (16,17,19). A model for these observations is that the IBB domain of importin ␣ competes with monopartite NLS sequences for the NLS binding site. However, importin ␤ binds tightly to this N-terminal IBB domain and sequesters it from the NLS binding site. The removal of this intramolecular competitive ligand increases the effective affinity of importin ␣ for the NLS in trans.
Although we observed no binding between an SV40-GFP fusion and full-length importin ␣, the competitive inhibition of the IBB domain could, in principle, be overcome by increasing the affinity of the NLS sequence for importin ␣. To test this conjecture, the affinity of full-length importin ␣ for the highaffinity BPSV40-GFP fusions was determined. The binding of BPSV40, BPSV40A4, -A5, and -A6 to full-length importin ␣ was measurable through the fluorescence depolarization assay (Table IV).
To compare the affinities of NLS-GFP fusion proteins for both ⌬IBB importin ␣ and full-length importin ␣, the functional stoichiometry of the protein preparations needed to be confirmed. To this end, an end point titration was performed for both ⌬IBB importin ␣ and full-length importin ␣ using BPSV40-GFP at high concentration as a standard. From this assay, 1 mol of full-length importin ␣ was functionally equivalent to 0.96 mol of ⌬IBB importin ␣. This assay confirms that both ⌬IBB importin ␣ and full-length importin ␣ are equally functional and folded properly. Thus, the affinities of these two proteins may be directly compared.
The binding affinity of ⌬IBB importin ␣ for four SV40 NLS variants (SV40, SV40A4, -A5, and -A6) is compared with the affinity of full-length importin ␣ for the bipartite versions of the same SV40 variants in Fig. 5. The relative change in binding energy with each alanine mutation is nearly identical for the SV40 and their bipartite variants. This suggests that the energy gained by the addition of the second basic cluster is nearly equivalent to the energy cost of the competition with the IBB domain. The profile in Fig. 5 additionally suggests that these energy modifications are independent of the specific sequence of the monopartite NLS. This conclusion is not entirely unexpected. The IBB domain should have the same intramolecular binding energy regardless of the NLS against which it competes. In addition, the distance between the second basic cluster and the monopartite sequence is large enough to expect these segments to interact with importin ␣ independently from each other. With these assumptions, the intramolecular binding energy of the IBB domain in competition with an NLS in trans can be estimated from the differences in the affinities in Fig. 5. The binding energy of a bipartite variant interacting with full-length importin ␣ is ϳ0.4 Ϯ 0.3 kcal/mol stronger than the energy of the same monopartite variant interacting with ⌬IBB importin ␣. Given the estimate for the interaction energy added with the second basic cluster of 3.2 kcal/mol, the intramolecular competition of the IBB domain reduces the binding affinity for an NLS by ϳ 2.8 kcal/mol. With this simplistic view of the auto-inhibitory behavior of importin ␣, the function of this intramolecular inhibition is not immediately obvious. A common hypothesis is that this intramolecular competition for the NLS binding site is responsible for the delivery of the cargo in the nucleus. Upon reaching the nuclear basket of the pore, the trimeric complex of importin ␣, importin ␤, and the NLS cargo encounter the small GTPase Ran in its GTP-bound state. Ran-GTP binds to importin ␤, which, through a conformation change, releases importin ␣ from importin ␤ (3). The release of the IBB domain from importin ␤ effectively reduces the affinity of importin ␣ for the NLS cargo through competitive inhibition and the cargo is released into the nucleoplasm.
This intuitive model breaks down when one considers the binding energies measured in this study. What happens when a cargo contains an NLS with an extremely high affinity for importin ␣ such as the BPSV40 NLS? The affinity of the BPSV40 NLS for the inhibited full-length importin ␣ is as high as the affinity of a functional SV40 NLS for the uninhibited ⌬IBB importin ␣. If ⌬IBB importin ␣ is considered a model for the importin ␣-importin ␤ complex, then the BPSV40 NLS should bind to importin ␣ in the nucleus with the same strength as the functional SV40 NLS binds to the importin ␣-importin ␤ complex in the cytoplasm and nuclear pore. This suggests that the release of these high affinity cargoes, and thus the transport of these cargoes, would be fairly inefficient if the mechanism for nuclear release was dependent solely on the competitive inhibition of the importin ␣ IBB domain. Such high affinity NLS sequences may not occur in vivo; indeed, the sequence of the BPSV40 NLS is completely artificial, but this model for release based on the auto-inhibition of importin ␣ suggests that abnormally high affinity NLS sequences would have to be avoided in a natural setting. A plausible alternative would be that the auto-inhibition observed in vitro is just one element of a more active release mechanism that can accommodate high affinity NLS sequences. So far, evidence for such an active mechanism has not been observed.
In Vitro Energy Scale for Nuclear Localization-One goal of this quantitative analysis is to provide a thermodynamic foundation for a numerical model of the process of nuclear transport. One would expect that the in vivo process of nuclear import would correlate in some manner to the energetics of the individual protein-protein interactions that drive the process. It has been suggested that the initial rate of protein import is linearly correlated with the equilibrium constant for the interaction between the NLS cargo and importin ␣ (30,34,38). This relationship would hold true in a simple model where the rate of protein import would depend on the equilibrium concentration of the importin ␣-importin ␤-cargo NLS ternary complex. One correlation that is clear from numerous studies is that there is some sort of functional threshold of affinity that an NLS must possess for importin ␣ for the cargo to be imported into the nucleus. When the SV40 NLS is mutated to the SV40T3 sequence, its affinity for importin ␣ decreases by ϳ3 kcal/mol and it also loses its ability to function as a nuclear localization signal in vivo (8). Given the arguments proposed above regarding the function of importin ␣ auto-inhibition in cargo release, it will be interesting to see whether there are both lower and upper thresholds for the binding energy of a functional NLS.
The quantitative data presented here yield a numerical skeleton on which to build a comprehensive model for this complicated process. With the assumptions made above, a given NLS can be situated on a linear scale that describes its affinity for the importin ␣-importin ␤ complex (using ⌬IBB importin ␣ as a model) as well as its affinity for importin ␣ alone. For an NLS to function in nuclear import, one might hypothesize that it must have an affinity for the importin ␣-importin ␤ complex that is tight enough to stimulate the uptake of the NLS cargo into the nuclear pore, but it must also have an affinity for lone importin ␣ that is weak enough to allow efficient release of the cargo into the nucleus. A scale of the values measured and calculated here is illustrated in Fig. 6. We are currently undertaking experiments to correlate the thermodynamic values for these interactions with the kinetics of nuclear import in vivo. FIG. 5. A comparison of the binding energy increase in a bipartite NLS to the energy decrease resulting from auto-inhibition. The binding affinities of monopartite NLS variants for ⌬IBB importin ␣ are compared with the binding affinities of full-length importin ␣ for the bipartite versions of the same NLS variants. Energies are calculated as ϪkT ln K d .
FIG. 6. An in vitro energy scale for nuclear localization. The relative dissociation constants, both measured and calculated, for various peptide sequences in complex with importin ␣ are shown on two logarithmic scales. On the left is a scale representing the affinity for the importin ␣-importin ␤ complex (as approximated using ⌬IBB importin ␣). The scale to the right represents the affinity for full-length importin ␣ alone (with inhibition from the IBB domain). Using the current model for protein import, the left scale represents the ability for each peptide to direct the cytoplasmic capture and translocation of the cargo to which the peptide is fused. The scale on the right would then represent the efficiency with which the cargo would be released in the nucleus. Functional studies suggest a lower threshold for a functional NLS on these scales, which exists between the affinities of the SV40 NLS and the SV40T3 variant. This model predicts the possible existence of an upper threshold for the affinity of an NLS sequence above which nuclear release by importin ␣ would be inefficient.