Structural Basis for the Specificity of Bipartite Nuclear Localization Sequence Binding by Importin- (cid:1) *

Importin- (cid:1) is the nuclear import receptor that recog-nizes cargo proteins carrying conventional basic monopartite and bipartite nuclear localization sequences (NLSs) and facilitates their transport into the nucleus. Bipartite NLSs contain two clusters of basic residues, connected by linkers of variable lengths. To determine the structural basis of the recognition of diverse bipartite NLSs by mammalian importin- (cid:1) , we co-crystallized a non-autoinhibited mouse receptor protein with peptides corresponding to the NLSs from human retinoblastoma protein and Xenopus laevis phosphoprotein N1N2, containing diverse sequences and lengths of the linker. We show that the basic clusters interact analogously in both NLSs, but the linker sequences adopt different conformations,

Importin-␣ is the nuclear import receptor that recognizes cargo proteins carrying conventional basic monopartite and bipartite nuclear localization sequences (NLSs) and facilitates their transport into the nucleus. Bipartite NLSs contain two clusters of basic residues, connected by linkers of variable lengths. To determine the structural basis of the recognition of diverse bipartite NLSs by mammalian importin-␣, we co-crystallized a non-autoinhibited mouse receptor protein with peptides corresponding to the NLSs from human retinoblastoma protein and Xenopus laevis phosphoprotein N1N2, containing diverse sequences and lengths of the linker. We show that the basic clusters interact analogously in both NLSs, but the linker sequences adopt different conformations, whereas both make specific contacts with the receptor. The available data allow us to draw general conclusions about the specificity of NLS binding by importin-␣ and facilitate an improved definition of the consensus sequence of a conventional basic/bipartite NLS (KRX 10 -12 KRRK) that can be used to identify novel nuclear proteins.
Nucleocytoplasmic transport occurs through nuclear pore complexes, large proteinaceous structures that penetrate the double lipid layer of the nuclear envelope. Most macromolecules require an active, signal-mediated transport process that enables the passage of particles up to 25 nm in diameter (ϳ25 MDa). The best characterized nuclear targeting signals are the conventional nuclear localization sequences (NLSs) 1 that con-tain one or more clusters of basic amino acids (1). The NLSs fall into two distinct classes termed monopartite NLSs, containing a single cluster of basic amino acids, and bipartite NLSs, comprising two basic clusters separated by a spacer.
Despite the variability, the conventional basic NLSs are recognized by the same receptor protein termed importin or karyopherin, a heterodimer of ␣ and ␤ subunits (for recent reviews, see Refs. [2][3][4]. Importin-␣ (Imp␣) contains the NLSbinding site, and importin-␤ (Imp␤) is responsible for the translocation of the importin-substrate complex through the nuclear pore complex. Once inside the nucleus, Ran-GTP binds to Imp␤ and causes the dissociation of the import complex. Imp␣ becomes autoinhibited, and both importin subunits return to the cytoplasm separately without the import cargo. The directionality of nuclear import is conferred by an asymmetric distribution of the GTP-and GDP-bound forms of Ran between the cytoplasm and the nucleus. This distribution is in turn controlled by various Ran-binding regulatory proteins.
Imp␣ consists of two structural and functional domains, a short basic N-terminal Imp␤-binding domain (5-7) and a large NLS-binding domain built of armadillo (Arm) repeats (8). The structural basis of monopartite and bipartite NLS recognition by Imp␣ has been studied crystallographically in yeast and mouse Imp␣ proteins (9 -11). The two basic clusters of the bipartite NLSs bind to two separate binding sites on Imp␣, involving Arm repeats 1-4 and 4 -8, respectively. Monopartite NLSs can bind in both sites but primarily use the binding site corresponding to the C-terminal basic cluster of the bipartite NLSs, referred to as the major site (9,11).
The structure of full-length Imp␣ indicated that the major NLS-binding site is occupied by residues 44 -54 from the Nterminal region of the protein (Imp␤-binding domain) that resembles an NLS (12); Imp␣ is therefore autoinhibited in the absence of Imp␤. This observation is supported by the measurements of higher NLS binding affinity by Imp␣/␤ as compared with Imp␣ alone (13)(14)(15)(16)(17)(18)(19)(20)(21)(22). The significance of the autoinhibitory mechanism has been confirmed by in vivo studies (23).
In this study, we present the crystal structures of peptides corresponding to the bipartite NLSs from human retinoblastoma protein (RB) and Xenopus laevis chromatin assembly factor N1N2 bound to mouse Imp␣. These NLSs were chosen to represent diverse sequences and different lengths of the linkers between the clusters, so that some general conclusions can be drawn on NLS binding. The basic clusters of both peptides bind in the expected binding pockets, but the linker regions make specific contacts with the receptor also. Comparisons with other available Imp␣ structures allow us to explain the specificities of monopartite and bipartite NLS binding and help us improve the definition of the consensus sequence of a conventional basic/bipartite NLS. The results will have general implications for recognizing the NLSs in new gene products identified in genome sequences and therefore for functional annotation of new proteins.

EXPERIMENTAL PROCEDURES
Peptide Synthesis-The peptides CGKRSAEGSNPPKPLKKLRGY (RB peptide) and CGRKKRKTEEESPLKDKAKKSKGY (N1N2 peptide) were synthesized using the Applied Biosystems 433A peptide synthesizer, purified by cation exchange chromatography followed by reverse phase chromatography, and analyzed by quantitative amino acid analysis using a Beckman 6300 amino acid analyzer and electrospray mass spectrometry (Sciex API 111, PerkinElmer Life Sciences) (24). The peptides RB and N1N2 correspond to the NLSs of human retinoblastoma protein, residues 861-877, and X. laevis N1N2 protein (N1N2), residues 535-555, respectively, with two heterologous residues added at each terminus.
Protein Expression, Purification, and Crystallization-N-terminal truncated mouse importin ␣ (␣2 isoform (25)) lacking 69 N-terminal residues (m-Imp␣) was expressed recombinantly in Escherichia coli as a fusion protein containing a hexa-histidine tag (11). For crystallization, m-Imp␣ was concentrated to 18.8 mg/ml (in 20 mM Tris-HCl (pH 8.0), 100 mM NaCl, and 10 mM dithiothreitol) using a Centricon-30 (Millipore) and stored at Ϫ20°C. Crystallization conditions were screened by systematically altering various parameters using the crystallization conditions successful for other peptide complexes (11) as a starting point. The crystals of both complexes (rod-shaped, 0.5 ϫ 0.2 ϫ 0.1 mm for the RB peptide complex and 0.4 ϫ 0.1 ϫ 0.07 mm for the N1N2 peptide complex) were obtained using co-crystallization by combining 1 l of protein solution, 0.7 l of peptide solution (1.7 mg/ml with peptide/protein ratio 3.5), and 1 l of reservoir solution and suspended over 0.5 ml of reservoir solution containing 0.6 M sodium citrate (pH 6.0) and 10 mM dithiothreitol.
Diffraction Data Collection-The crystals exhibit orthorhombic symmetry (space group P2 1 2 1 2 1 ; Table I). Diffraction data were collected from single crystals transiently soaked in a solution analogous to the reservoir solution but supplemented with 23% glycerol and flash-cooled at 100 K in a nitrogen stream (Oxford Cryosystems), using a MAR-Research image plate detector (plate diameter, 345 mm) and CuK␣ radiation from a Rigaku RU-200 rotating anode generator. Data were autoindexed and processed with the HKL suite (26) ( Table I).
Structure Determination and Refinement-The crystals of the peptide complexes were highly isomorphous with the crystals of fulllength Imp␣ (12); therefore, the structure of mouse Imp␣ (Protein Data Bank number 1IAL) with N-terminal residues omitted was used as a starting model for crystallographic refinement. Electron density maps were inspected for the presence of the peptide after rigid body refinement using the program CNS (27) (RB peptide: m-Imp␣ complex, R cryst ϭ 30.5%, R free ϭ 32.4%, 6 -4-Å resolution; N1N2 peptide: m-Imp␣ complex, R cryst ϭ 30.8%, R free ϭ 35.1%, 6 -4-Å resolution; Table I provides the explanation of R-factors). Electron density maps calculated with coefficients 3 F obs Ϫ 2 F calc and simulated annealing omit maps (Fig. 1) calculated with analogous coefficients were generally used. The model was improved, as judged by the free R-factor (28), through rounds of crystallographic refinement (positional and restrained isotropic individual B-factor refinement with an overall anisotropic temperature factor and bulk solvent correction) and manual rebuilding (program O (29)). Solvent molecules were added with the program CNS (27). Asn 239 is an outlier in the Ramachandran plot as also observed in all other structures of mouse Imp␣ (11,12). Pro 242 is a cis-proline. The final models comprise 427 Imp␣ residues (residues 71-497), 20 peptide residues, and 173 water molecules for RB where I hkl,i is the intensity of an individual measurement of the reflection with Miller indices h, k and l, and ϽI hkl Ͼ is the mean intensity of that reflection. Calculated for where F(obs) hkl and F(calc) hkl are the observed and calculated structure factor amplitudes, respectively. d R free is equivalent to R cryst but calculated with reflections (5%) omitted from the refinement process. e Calculated with the program CNS (27). f Calculated with the program PROCHECK (30).
peptide-Imp␣ complex, and 426 Imp␣ residues (residues 72-496), 21 peptide residues, and 123 water molecules for N1N2 peptide-Imp␣ complex ( Table I). The coordinates have been deposited in the Protein Data Bank (Protein Data Bank numbers 1PJM and 1PJN for the RB and N1N2 peptide complexes, respectively). Structure Analysis-The quality of the models was assessed with the program PROCHECK (30). The contacts were analyzed with the program CONTACT, and the buried surface areas were calculated using the program CNS (27).
Bioinformatic Analysis-We used the consensus sequence KRRK to search for NLS-containing proteins using the Quick Matrix option of Scansite (31); the sequence KRXK was entered for the primary preference positions 0 to ϩ3, and R was entered into the secondary preference position ϩ2. The bipartite consensus was too long to use with the current version of Scansite. Testing using proteins with known NLSs showed that there is a high likelihood of detecting a functional NLS at Scansite scores Յ 0.0408 (corresponding to 0.448% of all yeast proteins) and a reasonable likelihood at Scansite scores Յ 0.0596 (corresponding to 1.169% of all yeast proteins). To estimate the efficiency of detecting an unknown NLS, we performed a Scansite search with a test set of 50 randomly selected yeast nuclear proteins (Munich Information Center for Protein Sequences subcellular catalogue (32)); 9 proteins showed a sequence match to the above motif with Scansite scores Յ 0.0408, and 23 proteins (46%) showed a match with Scansite scores Յ 0.0569. For comparison, we searched for NLSs in the same test set of proteins using PredictNLS (33); this method detected an NLS in nine proteins, some of which did not belong to the conventional basic/bipartite group.

RESULTS AND DISCUSSION
Structure Determination-The RB and N1N2 NLS peptides were co-crystallized with an N-terminal truncated mouse Imp␣ lacking residues 1-69 (m-Imp␣); residues 1-69 are responsible for autoinhibition. The co-crystals with both peptides grew in similar conditions and isomorphously to other mouse Imp␣ crystals (11,12). Electron density maps based on the Imp␣ model, following rigid body refinement, clearly showed electron density corresponding to the peptides (Fig. 1). The structures were refined at 2.5-Å resolution for both complexes (Table I).
Residues 859 -878 of the peptide RB (residue 879 had no interpretable electron density) and residues 535-555 of the peptide N1N2 (residues 533-534 and 556 had no interpretable electron density) could unambiguously be identified in the electron density maps (Fig. 1). The side chains of Glu 543 and Lys 549 of N1N2 were poorly ordered; therefore, these residues were modeled as alanines.
Structure of Importin-␣ in the Complexes-Imp␣ forms a single elongated domain built from 10 Arm structural repeats, each containing three ␣ helices (H1, H2, and H3) connected by loops (Fig. 2). The structure of Imp␣ in the complexes is comparable with the crystal structure of the full-length Imp␣. (r.m.s. deviations of C␣ atoms of Imp␣ residues 72-496 are 0.22 Å between the RB and N1N2 peptide-m-Imp␣ complexes; 0.35 and 0.31 Å between full-length Imp␣ and the RB and N1N2 peptide-m-Imp␣ complexes, respectively; and 0.34 and 0.30 Å between nucleoplasmin peptide-m-Imp␣ and the RB and N1N2 peptide-m-Imp␣ complexes, respectively.) Binding of the NLS Peptides to Importin-␣-The peptides bind in an extended conformation with the chain running antiparallel to the direction of the Arm repeat superhelix (Fig. 2). The base of the groove that contains the binding sites is formed mainly by the H3 helices of the Arm repeats, which carry some residues conserved among the repeats, including the tryptophans and asparagines at the third and fourth turns in H3 helices of the Arm repeats, respectively (10,11).
The two basic clusters of the RB and N1N2 peptides bind to two separate well defined binding sites on the surface of the m-Imp␣ molecule, referred to as the minor and major sites (Fig.  2). The minor site specifically binds to the N-terminal basic cluster KR, and the larger, C-terminal basic cluster binds to the major site. The electron density is present for 20 peptide residues in the RB peptide (average B-factor, 63.3 Å 2 ) and for 21 peptide residues of the N1N2 peptide (average B-factor, 72.1 Å 2 ) (Fig. 1). In both complexes, the linker sequences connecting the major and minor sites (residues 863-873 and 539 -550 for RB and N1N2, respectively) have B-factors above the average number for the entire peptides (71.1 and 82.8 Å 2 for RB and N1N2, respectively). By contrast, the residues bound to the major sites of both complexes have lower B-factors (40.5 and 47.6 Å 2 for position P 2 -P 5 of RB and N1N2, respectively), FIG. 1. Structure determination. A, stereo view of the electron density (drawn with the program BOBSCRIPT (39)) in the region of the RB peptide bound to the major binding site of m-Imp␣. All peptide residues were omitted from the model, and simulated annealing was run with the starting temperature of 1000 K. The electron density map was calculated with coefficients 3 F obs Ϫ 2 F calc and data between 40and 2.5-Å resolution and contoured at 1.5 standard deviations. The refined model of the peptide is superimposed. B, stereo view of the electron density (contoured at 1.5 standard deviations) in the region of the N1N2 peptide bound to the major binding site of m-Imp␣, shown as in A.
reflecting the strong interaction of these residues with the protein.
There is 2573 and 2715 Å 2 of surface area buried between m-Imp␣ and the RB and N1N2 peptide, respectively. All residues of the RB peptide, except residues 865 and 869, and of the N1N2 peptide, except residues 542 and 546, make contacts with m-Imp␣ at distances below 4 Å (Fig. 3).
The minor and major site portions of the RB and N1N2 peptides have very similar structures (Fig. 4); after superposition of the equivalent C␣ atoms, the r.m.s. deviation of the residues in positions P 1 -P 6 is 0.19 Å, and the r.m.s. deviation of the residues in positions P 1 Ј-P 3 Ј is 0.03 Å. By contrast, they adopt very different conformations in the linker regions between positions P 3 Ј and P 1 ; the path of the peptide chain is more linear in the case of RB than N1N2. The bipartite NLS linker sequence of both peptides makes favorable interactions with the H3 helices of armadillo repeats 4 -7 of Imp␣. The most important contacts are limited to residues Asn 868 and Lys 871 of RB and Lys 547 of N1N2.
Among other interactions, the conserved residues Arg 315 (Arm 6) and Tyr 277 (Arm 5) of Imp␣ that interrupt the regularity of the Trp-Asn array (11) make extensive main chain and side chain contacts with the peptides. These residues also interact with nucleoplasmin NLS in mammalian and yeast Imp␣ (10,11). However, in that case, the interaction involves only the main chain of the peptide. Other important contacts are made with Arg 238 (main chain of RB and N1N2), Ser 276 (side chain of N1N2), and Ser 234 (side chain of RB).
Comparison with Other NLS Peptide-Importin-␣ Complex Structures-Significantly, the work presented here allows us for the first time to perform a detailed comparison of the structural determinants of binding of a number of NLSs to Imp␣, to align the NLSs with the binding pockets on Imp␣ with some confidence, and to draw some general conclusions on the specificity of NLS binding (Table II). Despite diverse se-quences, the binding of both basic clusters (positions P 1 -P 5 and P 1 Ј-P 2 Ј) is similar in all the available structures, and the major differences occur in the linker regions connecting the basic clusters (Fig. 4). The exceptions are the N-and C-terminal portions of nucleoplasmin NLS bound to m-Imp␣, where the side chains of Lys 155 (position P 1 Ј; N terminus of the peptide) and Lys 170 (position P 5 ; C terminus of the peptide) follow the direction of the main chain of the other peptides. Because all the other peptides contain at least one additional residue at the N and C termini, the conformation of the nucleoplasmin NLS peptide may be an artifact of the short length of the peptide. The case of nucleoplasmin highlights the importance of residues preceding position P 1 Ј and following position P 6 .
The structures of nucleoplasmin NLS bound to y-Imp␣ and m-Imp␣ are significantly different (r.m.s. deviation of C␣ atoms of residues 155-170 is 2.44 Å). In addition to the differences caused by the different lengths of the peptides discussed above, some differences may be explained by structural differences between the two Imp␣ proteins; the yeast structure is slightly more "open" than the mouse structure (12). Most of the differences are found in the region comprising residues 159 -165 (there is high structural similarity when only the major (r.m.s. deviation of C␣ atoms for positions P 2 -P 5 is 0.28 Å) and minor (r.m.s. deviation of C␣ atoms for positions P1Ј-P3Ј is 0.24 Å) sites are superimposed). With the basic clusters binding most tightly, the different curvatures of the two Imp␣ proteins appear to be compensated in the linker region. Although some linker region residues have different conformations, the main contacts of this region with Imp␣ are comparable in both nucleoplasmin structures. The most important interactions occur for the main chain of conserved residues Arg 315 , Arg 238 , and Tyr 277 of m-Imp␣ (Arg 321 , Arg 244 , and Tyr 283 of y-Imp␣).
The portions of RB and N1N2 peptides bound in the major and minor sites superimpose closely with the nucleoplasmin NLS (the r.m.s. deviations of C␣ atoms of positions P 1 -P 5 and positions P 1 Ј-P 3 Ј are 0.34 and 0.42 Å between RB and nucleoplasmin and 0.33 and 0.42 Å between N1N2 and nucleoplasmin, respectively). The structure of the nucleoplasmin linker region is more similar to that of RB than that of N1N2 (r.m.s. deviation of C␣ atoms of linker region is 1.70 Å between RB (864 -873) and nucleoplasmin (157-165) and 2.20 Å between N1N2 (541-550) and nucleoplasmin (157-165)), mainly in the region closest to the major site (residues 162-165 for nucleoplasmin). The region of the linker closest to the major site is structurally conserved best among the three bipartite NLS peptides.
Binding of Bipartite NLS Linker Regions-The nucleoplasmin NLS-Imp␣ complexes (10,11) showed that the main chain of the linker region binds to the conserved Imp␣ residues Arg 315 (Arm 6) and Tyr 277 (Arm 5) (Arg 321 and Tyr 283 for y-Imp␣; these residues interrupt the regularity of Trp-Asn array (11)). The conserved residue Arg 238 (Arm 4) (Arg 244 for y-Imp␣) also binds to the main chain of nucleoplasmin NLS. By contrast, the structures of RB and N1N2-m-Imp␣ complexes show the binding of Arg 315 and Tyr 277 to the side chains of peptides (Arg 238 binds to the main chain of the peptides). These observations suggest that residues Arg 315 , Tyr 277 , and Arg 238 play crucial roles in binding bipartite NLSs; however, these interactions are specific to individual NLSs.
Other important contacts of the peptide linker with m-Imp␣ involve the residues Asn 868 and Lys 871 for the RB peptide and Lys 547 for the N1N2 peptide; all of them are situated close to the major site. This is the region of the linker most structurally conserved among the three bipartite NLS peptides. The electron density maps of RB and N1N2 have a superior quality in this region as compared with the rest of linker. These data suggest that the C-terminal portion of the linker (closest to the major site) plays an important role in binding to Imp␣. These data are supported by NLS binding studies (15,17,19,21) and structural analyses of extended simian virus 40 (SV40) large tumor-antigen (T-Ag) peptide complexes. 2 The electron density maps suggest that the RB peptide is better ordered overall, consistent with a larger number of contacts. Not every linker residue is able to make contacts with the protein. Because of the differences in the lengths of the linkers between RB (11 residues, if we define the linker sequence as the sequence between sites P 2 Ј and P 2 ) and N1N2 (12 residues), the RB peptide binds to m-Imp␣ in a more extended conformation than the N1N2. The 11-residue length appears more favorable for binding; 12 residues require some short turns and force the chain farther from Imp␣, precluding some favorable interactions. Importantly, this is consistent with the observa- tion that incorporation of QPWL in the linker region of nucleoplasmin NLS reduced the efficiency of nuclear import (34).
The affinities for mouse Imp␣ of N1N2 and RB NLSs fused to ␤-galactosidase have been determined using enzyme-linked immunosorbent assay assays (15,19). N1N2 binds with a higher affinity than RB to both Imp␣/␤ complex (K d measured as 5.4 nM for N1N2 and 45 nM for RB) and Imp␣ alone (K d measured as 22 or N1N2 and 180 nM for RB). Therefore, although the linker region appears to interact more favorably in the case of RB, it is most likely the presence of Lys in position P 5 that is responsible for the higher affinity of the recognition of N1N2 NLS by Imp␣ (see below).
The Role of Minor Site Binding-Mutagenesis studies with nucleoplasmin (34), RB (15), and N1N2 (19) NLSs revealed that substitutions of P1Ј and P2Ј residues to other than Arg or Lys abolished nuclear localization. The binding of Lys-Arg at positions P 1 Ј and P 2 Ј is observed in all available bipartite NLS peptide-Imp␣ complex structures. Similarly, the majority of the bipartite NLSs characterized contain a Lys-Arg sequence in the N-terminal cluster (35). Comparison of the structures here reveals a high structural similarity of the binding of the Arg side chain in the position P 2 Ј in all cases. The Arg side chain is situated at the groove created by the tryptophans Trp 399 and Trp 357 (Trp 405 and Trp 363 for y-Imp␣) located in H3 helices of Arm repeats 7 and 8 and also contacts Glu 396 and Ser 360 (Glu 402 and Ser 366 in y-Imp␣); these residues are conserved among known Imp␣ sequences. The interaction with Glu 396 appears particularly important because the distance from the Arg side chain is nearly identical in all the structures. The binding of T-Ag at the minor site (11) represents a model for binding of Lys (instead of Arg) at P 2 Ј; the peptide 126 PKKKRKV 132 places Lys 128 and Lys 129 at positions P 1 Ј and P 2 Ј, respectively. This likely occurs because it is more favorable to have some amino acid binding at positions P 4 Ј and P 5 Ј, rather than having an Arg at P 2 Ј. However, some evidence of staggering in different registers was observed in the crystal structure (11). The Lys at position P 2 Ј of T-Ag contacts Glu 396 and Ser 360 at approximately the same distance but does not contact Trp 399 , losing one favorable contact. The monopartite NLS from c-Myc binds with Lys-Arg in the P 1 Ј-P 2 Ј positions (10).
A Lys side chain at position P 1 Ј makes less favorable contacts than Arg at position P 2 Ј, and this position appears less important than the P 2 Ј position for bipartite NLS binding. The P 1 Ј Lys of RB and N1N2 peptides makes interactions with the side chains of the conserved Thr 328 and Asn 361 and the main chain of Val 321 . Also close by is Asp 325 . The P 1 Ј Lys of nucleoplasmin bound to y-Imp␣ also contacts the equivalent of Thr 328 and Val 321 but not Asn 361 . The pocket prefers a Lys residue because an Arg side chain is too long to make similar favorable contacts in the P 1 Ј pocket.
Finally, we suggest that the positions preceding the P 1 Ј and following P 2 Ј contribute significantly to the minor site binding. This is consistent with the side chain of the N-terminal Lys 155 (position P 1 Ј) of the nucleoplasmin peptide preferring to follow the main chain of the other peptides instead of binding in the regular P 1 Ј pocket. Also, the T-Ag peptide 126 PKKKRKV 132 binds to Imp␣ with two Lys residues in P 1 Ј and P 2 Ј instead of the more favorable Arg at P 2 Ј, possibly so that it can place some amino acid in positions P 4 Ј and P 5 Ј.
The NLS Consensus-The definition of an NLS consensus based on an analysis of sequences alone is difficult due to the diversity of sequences that can function as nuclear targeting signals. The structural data, the accumulated knowledge on functional NLSs, and mutagenesis studies now allow us to better define the conventional basic/bipartite NLS consensus that involves Imp␣ binding. The ability to recognize NLSs is of significance, as it could help with the functional annotation of new proteins predicted in genomic sequences.
We used an approach to define the NLS consensus similar to that used previously to define optimal substrates for protein kinases (36). The structures of Imp␣-NLS peptides were analyzed by molecular modeling in conjunction with other available information. This allows us to draw some conclusions about individual positions in the NLS.
One general observation is that the binding pockets for individual side chains are much better defined in the major site than in the minor site. This observation is consistent with the contribution of the N-terminal basic cluster (minor site) in a bipartite NLS of ϳ4 kcal/mol, which is comparable with the loss of a single Lys side chain in the major basic cluster (21). A large portion of the free energy of binding is contributed by the main chain of the peptide (estimated at ϳ6.9 kcal/mol), but a functional NLS requires an additional contribution of the side chains of about 4.2 kcal/mol; this contribution by the side chains accounts for the specificity of nuclear import. The best defined pockets include P 2 , P 3 , and P 5 of the major site. P 2 is defined mainly by residues Thr 155 and Asp 192 in adjacent Arm repeats of Imp␣ and is well suited for binding a Lys side chain. Arg can be accommodated only to a lesser degree, consistent with the loss of 2.7 kcal/mol for the Lys to Arg mutation in the context of T-Ag (21). Side chains such as Thr, Gln, and Pro could be accommodated but would contribute less to binding; there is a substantial electrostatic contribution to binding in this pocket. P 3 , comprising Glu 266 and Asp 270 as the major binding determinant, prefers Arg over Lys based on Ala mutagenesis; this is likely because an Arg can reach closer to Glu 266 and Asp 270 . The energetic contribution of the side chains in the P 3 site is about two-thirds of the contribution in P 2 , with a smaller contribution of the electrostatic terms. The modeling suggests that P 5 , with the primary binding residue Gln 81 , is best suited for binding a Lys side chain. The electrostatic contribution is very minor, and positive charge is not strictly required; due to size, Gln and Glu would be favored over Arg. The energetic contribution of this site is similar to P 3 (21).
The other pockets are less specific. The P 4 pocket is relatively large and should favor Arg because this is the only side chain that can reach to the main chain of Imp␣ Arg 106 and Glu 107 and make favorable interactions. However, hydrophobic side chains such as Pro, Val, Ile, or Leu could also be accommodated relatively well. The discrimination between Arg and Val is only minor based on the Ala mutagenesis results of T-Ag and c-Myc NLSs, and the contribution of this pocket is about four times smaller than that of P 2 . P 1 , with Asp 270 responsible for specificity, prefers Lys over Arg, whereas side chains such as Gln, Pro, or Gly could easily be tolerated, and the side chain contribution in this pocket is only half of that of pocket P 4 . The pocket preceding P 1 has Trp 231 as the major specificity determinant and also prefers Lys over residues such as Arg, Pro, Ala, Glu, or Gly, but is also not very specific. The minor site is much less defined than the major site. Only the sites P 1 Ј and P 2 Ј appear to be able of significant discrimination between different peptide side chains, with Lys and Arg favored in those two positions, respectively, as discussed above. The N-terminal basic cluster may play a role in relieving autoinhibition (the minor site is not autoinhibited (12)).
We conclude that the optimal basic/bipartite consensus sequence for binding Imp␣ is KRX 10 -12 KRRK, with Lys at position P 2 (bold and underlined) the most important specificity determinant that also forces the rest of the sequence to bind in one register. The basic residues at positions P 3 and P 5 (underlined) are also significant (Table II). To facilitate accurate NLS identification in novel sequences, the motif could be represented in a matrix format with probabilities for each of the 20 amino acids defined at each position (31); this should be possible when experimental data are available through a peptide library (37) or equivalent experiment.
Our discussion assumes that there is a linear correlation between Imp␣ binding and the rate of nuclear import, as the available data suggest (15,16,38). However, a functional NLS may display both lower and upper limits in affinity (the NLS needs to both to bind and release) (21). Therefore, the optimal Imp␣-binding sequence is not necessarily an optimal NLS in terms of the overall nuclear import process. It should be possible to define both limits experimentally and find a correlation between the optimal Imp␣-binding sequence and the optimal NLS.
An alternative approach of finding NLSs is to build an expert data base of experimentally known NLSs and extend it through "in silico mutagenesis," as implemented in PredictNLS (33). The accuracy and coverage of this approach depends on the number of experimentally known NLSs. This approach does not distinguish between different nuclear import pathways, does not have a significant predictive power, and thus is complementary to our approach of defining the conventional basic/ bipartite NLS consensus. A comparison of the efficiency of NLS detection in known nuclear proteins between PredictNLS and our consensus-based search using Scansite (31) revealed the latter approach to be superior, although a significant portion of the nuclear proteins does not utilize the Imp␣/Imp␤-dependent import pathway (see "Experimental Procedures").
Conclusions-The structures of the complexes of mammalian Imp␣ with the peptides RB and N1N2, corresponding to bipartite NLS, and their comparisons with other available Imp␣ structures, provide new insights into the molecular basis of nuclear import and its regulation. Although the binding of both basic clusters is comparable in all the structures, the linker region connecting the two basic clusters of an NLS makes different but specific contacts with Imp␣ in the cases of all three common linker lengths (10, 11, and 12 residues). The most critical region in the linker for binding is the extreme C-terminal end of the linker sequence. The most important residue binding in the minor site is an Arg at position P 2 Ј of the NLS, although the surrounding residues make significant contributions. The integration of the structural information with sequences of characterized NLSs and mutagenesis data allows us to provide an improved consensus sequence for the conventional basic/bipartite NLS, KRX 10 -12 KRRK.