Probing the Specificity of Binding to the Major Nuclear Localization Sequence-binding Site of Importin-α Using Oriented Peptide Library Screening*

Importin-α is the nuclear import receptor that recognizes the classic monopartite and bipartite nuclear localization sequences (cNLSs), which contain one or two clusters of basic amino acids, respectively. Different importin-α paralogs in a single organism are specific for distinct repertoires of cargos. Structural studies revealed that monopartite cNLSs and the C-terminal basic clusters of the bipartite cNLSs bind to the same site on importin-α, termed the major cNLS-binding site. We used an oriented peptide library approach with five degenerate positions to probe the specificity of the major cNLS-binding site in importin-α. We identified the sequences KKKRR, KKKRK, and KKRKK as the optimal sequences for binding to this site for mouse importin-α2, human importin-α1, and human importin-α5, respectively. The crystal structure of mouse importin-α2 with its optimal peptide confirmed the expected binding mode resembling the binding of simian virus 40 large tumor-antigen cNLS. Binding assays confirmed that the peptides containing these sequences bound to the corresponding proteins with low nanomolar affinities. Nuclear import assays showed that the sequences acted as functional cNLSs, with specificity for particular importin-αs. This is the first time that structural information has been linked to an oriented peptide library screening approach for importin-α; the results will contribute to understanding of the sequence determinants of cNLSs, and may help identify as yet unidentified cNLSs in novel proteins.

Importin-␣ is the nuclear import receptor that recognizes the classic monopartite and bipartite nuclear localization sequences (cNLSs), which contain one or two clusters of basic amino acids, respectively. Different importin-␣ paralogs in a single organism are specific for distinct repertoires of cargos. Structural studies revealed that monopartite cNLSs and the C-terminal basic clusters of the bipartite cNLSs bind to the same site on importin-␣, termed the major cNLS-binding site. We used an oriented peptide library approach with five degenerate positions to probe the specificity of the major cNLS-binding site in importin-␣. We identified the sequences KKKRR, KKKRK, and KKRKK as the optimal sequences for binding to this site for mouse importin-␣2, human importin-␣1, and human importin-␣5, respectively. The crystal structure of mouse importin-␣2 with its optimal peptide confirmed the expected binding mode resembling the binding of simian virus 40 large tumor-antigen cNLS. Binding assays confirmed that the peptides containing these sequences bound to the corresponding proteins with low nanomolar affinities. Nuclear import assays showed that the sequences acted as functional cNLSs, with specificity for particular importin-␣s. This is the first time that structural information has been linked to an oriented peptide library screening approach for importin-␣; the results will contribute to understanding of the sequence determinants of cNLSs, and may help identify as yet unidentified cNLSs in novel proteins.
Cellular compartments are the defining feature of a eukaryotic cell. The nucleus allows the separation of the genetic material and transcriptional machinery from the translational and metabolic processes in the cytoplasm. The nucleus is defined by a double membrane called the nuclear envelope, and the transport across this barrier occurs through large macromolecular assemblies called the nuclear pore complexes (1,2). Transport of proteins through this pore is facilitated by nucleocytoplasmic transport factors termed karyopherins. These proteins recognize special sequences within the cargo proteins and selectively transport them in (importins) or out (exportins) of the nucleus. Most of the transport factors are built from repetitive sequences such as HEAT (huntingtin EF3, A subunit of PP2A, TOR1) repeats or armadillo (ARM) 5 repeats, and belong structurally to the class of proteins termed solenoid proteins (3).
The best characterized nuclear import pathway requires a positively charged sequence within the cargo protein, termed the classic nuclear localization sequence (cNLS) (4). This sequence binds to the ARM repeat protein importin-␣ (Imp␣). Imp␣, through its N-terminal sequence termed the importin-␤-binding (IBB) domain, in turn binds importin-␤ (Imp␤1) (5), a HEAT-repeat protein. Imp␤1 transiently binds to the proteins lining the nuclear pore (nucleoporins) and in this way facilitates the translocation of this trimeric complex through the pore. The principal regulator of importin-dependent nuclear transport pathways is the small GTPase Ran, which cycles between GDP-and GTP-bound states (6,7), and RanGTP binding to Imp␤1 dissociates the cargo complex once it reaches the nucleus (4).
Two classes of cNLSs recognized by Imp␣ have been distinguished: monopartite cNLSs, which contain one cluster of basic amino acids, and bipartite cNLSs, which contain two clusters of basic amino acids separated by a variable linker (8). Structural studies have shown, however, that both these types of cNLSs bind to the same site on Imp␣ (9 -16). The cNLS-binding site corresponds to a long groove formed by the ARM repeat structure, and the N-and C-terminal basic clusters of the bipartite cNLS interact with repeats 4 -8 and 1-4, respectively, referred to as the minor and major cNLS binding sites (9,11,12). By contrast, the monopartite cNLS interacts mainly with the major binding site (10,11,13). A recent study proposed subdividing cNLSs further into six different classes, based on the results of screening random peptide libraries using mRNA display (17). The interpretation of the screening results in this work relied on a sequence alignment of the derived peptides. Furthermore, no structural data have been presented to support different binding modes for these different classes, and therefore it is most likely that all six classes use the same binding site, exploiting different features of the site. Class-1 and Class-2-like sequences have already been shown to use an identical binding mode (12,18).
Integration of the available structural data with thermodynamic studies of different cNLSs and their mutants (17,24,25) defined a consensus for a bipartite cNLS as KRX 10 -12 KRRK, where the Lys in bold and the underlined represents the most important specificity determinant, and the basic residues at the other underlined positions are also critical (12). The monopartite cNLS simply corresponds to the C-terminal basic cluster of a bipartite cNLS. The critical residues in the bipartite cNLS have been termed P1Ј-P2Ј and P2-P5 for the N-and C-terminal clusters, respectively (12). It is clear, however, that the flanking sequences and the composition of the linker in the bipartite cNLS can further modulate the binding affinity (13,17,25). An additional complicating factor is that both Imp␣ proteins from different organisms, and different Imp␣ paralogs in a single organism, exhibit differences in specificity (20, 21, 26 -34); for example, transcription factors Brn2 (30) and STAT1/STAT2 (29) show specificity for Imp␣5, whereas RCC1 shows specificity for the Imp␣3/4 family (28).
Our understanding of the details of the specificity of cNLS⅐Imp␣ binding and, therefore, the ability to identify cNLSs in protein sequences remains limited. To address these points, we used an oriented peptide library approach to select the optimal sequences for binding to the major cNLS-binding site in representatives of two different clades of Imp␣. The oriented peptide library approach has been previously used successfully to determine the optimal sequences recognized by, for example, SH2 domains (35), protein kinases (36), and 14-3-3 (37) proteins. We confirmed, using binding assays, that the peptides corresponding to the selected sequences bound to Imp␣ proteins with high affinities, and established the structural basis of their binding to Imp␣ by determining the crystal structure of an Imp␣⅐peptide complex. We also demonstrate that these peptides can function as cNLSs, using in vitro nuclear import assays. Thus, linking structural information to an oriented peptide library screening approach for the first time, the results contribute to understanding of the sequence determinants of cNLSs and may help to identify as yet unidentified cNLSs in novel proteins.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-Hexa-His-tagged recombinant Imp␣ proteins were used for crystallography and nuclear import assays, whereas GST-tagged proteins were used for peptide library screening and ELISA-based binding assays. The cDNAs encoding sequences of human importin-␣1/karyopherin-␣2 comprising amino acids 67-503 (hImp␣1 67-503 ; RefSeq accession number NP_002257) and human importin-␣5/karyopherin-␣1 comprising amino acids 60 -511 (hImp␣5 60 -511 ; NP_002255) were amplified by PCR, and the PCR products were digested with the endonucleases EcoRI and BamHI, purified by agarose gel electrophoresis, and ligated into similarly digested vector pGex-KG (insert:vector molar ratios of 6:1 or 3:1). The ligated DNA was transformed into DH5␣ Escherichia coli cells. Colonies were screened by endonuclease digestion and sequenced to ensure the fidelity of the constructs. The pGex-KG-h␣1 and pGex-KG-h␣5 plasmids were transformed into BL21(DE3) host cells, and the GST-tagged proteins were expressed by isopropyl-␤-D-thiogalactopyranoside (all chemicals were from Sigma unless stated otherwise) induction at room temperature (23°C) for 4 h. Protein degradation was observed for the hImp␣5 GST fusion protein after inducing with 1 mM isopropyl-␤-D-thiogalactopyranoside, but this was minimized by growing the bacteria at 37°C to A 600 of 0.8 without induction. Bacteria carrying mouse importin-␣2 (mImp␣2; NP_034785) and mouse importin-␤ (mImp␤1; NP_032405) GST fusion protein-encoding constructs (38) were grown to A 600 of 1.0 and induced with 1 mM isopropyl-␤-D-thiogalactopyranoside for 4 h at 28°C. The proteins were purified by affinity chromatography using a glutathione-Sepharose 4B resin. They were eluted using 50 mM glutathione, and the glutathione was removed by overnight dialysis. The purity of the proteins was estimated to be Ͼ90% by SDS-PAGE.
Hexa-His-tagged mImp␣2 70 -529 (comprising amino acids 70 -529) was expressed and purified by nickel affinity chromatography as described previously (12). For crystallization experiments, cation-exchange chromatography was also performed to increase purity. The protein was eluted with 200 mM sodium chloride, and salt was removed by dialysis. The purity was estimated to be 98% by SDS-PAGE.
Peptide Synthesis-Peptides were synthesized according to standard benzatriazole-1-yl-oxy-tris-(dimethylamino)phosphoniumhexafluorophosphate/1-hydroxybenzotriazole coupling protocol. The synthesized peptides were characterized by reversed phase-high-performance liquid chromatography (125 ϫ 4.6 mm, C18 Zorbax column, linear gradients with 0.1% trifluoroacetic acid (Merck) over 25 min, flow 1 ml/min) and by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. The solutions were lyophilized, and the peptides were stored at Ϫ80°C. The sequences of peptides pepTH1, pepTH5, and pepTM are given in Table 1. The peptides were synthesized with a Cys residue at the N terminus to facilitate nuclear import assays.
The peptide library comprised peptides with the sequence GSEFESPXKXXXXEA (Table 1), where X represents any amino acids except for C and W. Cys and Trp were omitted to avoid oxidation and sequencing problems (39). The total theoretical degeneracy of this library is 18 5 . An aliquot of each library was sequenced by Edman degradation, to confirm that all 18 amino acids were presented in similar amounts at all degenerate positions.
Peptide Library Screening-Imp␣-GST fusion proteins (1 mg) were immobilized on 100 l of glutathione-agarose beads. 300 l of solution containing 1 mg of the peptide mixture was added to the immobilized Imp␣-GST and incubated for 4 h at 4°C to allow for peptide binding to Imp␣. Unbound peptides were subsequently removed by rapid washing twice with 1 ml of ice-cold HBS (Hepes-buffered saline) buffer, followed by 1 ml HBS buffer without Tween 20. To elute the peptides, the beads were boiled for 5 min causing the proteins to denature and release the peptides. The denatured proteins were then separated from the peptides using a 10-kDa cut-off Microcon (Millipore). The flow-through was collected and sequenced on the protein-sequencing system 785A (Applied Biosystems) using Edman degradation. As a control, the same procedure was performed with GST lacking Imp␣.
The abundance of each amino acid at a given cycle in the sequence of the peptide mixture was divided by the abundance of the same amino acid in the same cycle of the starting mixture. This accounts for variations in the abundance of amino acids at particular residues in the starting library or variations in yield of amino acids during sequencing. Also, the molar percentages of amino acids in each cycle for the sample relative to those in the GST control were normalized. To scale the relative preferences among all amino acids present in the degenerate library, these raw preference values were then summed and normalized to the total number of amino acids in the degenerate position. Selectivity values Ͼ2.0 indicate strong selection (37). It should be noted that the level of background noise in both GST-only and GST-importin fusion proteins necessitated a cycle-based approach to amino acid quantification. Individual amino acid intensities were calculated by comparison with the preceding cycle and tracked in the subsequent cycle.
Cross-linking of Peptides with Alexa Fluor 488-Streptavidin-N-terminal Cys-containing peptides (pepTH1, pepTH5, and pepTM) were cross-linked to Alexa Fluor 488-streptavidin (fst, Molecular Probes) using the heterobifunctional cross-linker N-succinimidyl 3-(2-pyridyldithio)propionate (Sigma) in a two-step reaction. 4.2 nmol of Alexa Fluor 488-streptavidin in buffer S2 (50 mM Na 3 PO 4 , 0.15 M NaCl, pH 7.2) was mixed with 33.6 nmol (8 molar excess of Alexa Fluor 488-streptavidin) ethanolic solution of N-succinimidyl 3-(2-pyridyldithio)propionate. The mixture was incubated at room temperature overnight. The excess N-succinimidyl 3-(2-pyridyldithio)propionate was removed by running the mixture through a PD10 column. To determine the number of N-succinimidyl 3-(2pyridyldithio)propionate molecules cross-linked to Alexa Fluor 488-streptavidin, a 5-l aliquot of the solution was reduced with 5 l of 0.5 M dithiothreitol and incubated for 1 h, made up to 1 ml with deionized water, and analyzed for pyridine-2thione production by measuring absorbance at 343 nm. 1.2 mg of peptide (20 molar excess with respect to cross-linked Alexa Fluor 488-streptavidin) was added to 33.6 nmol of cross-linked Alexa Fluor 488-streptavidin and incubated at 20°C for 24 h in the dark. To determine the number of peptides conjugated to each molecule of cross-linked Alexa Fluor 488-streptavidin, absorbance was again measured at 343 nm to assess the production of pyridine-2-thione. The final product was buffer-exchanged to HBS buffer using a PD10 column. The conjugates were aliquoted and stored at Ϫ20°C in the dark.
ELISA-based Binding Assays-The binding of peptides to bacterially expressed GST-tagged importins was analyzed using an ELISA-based binding assay (40). Peptides were coated into the wells of polystyrene microtiter plates (Nunclon) in triplicate at 5 pmol/well using 50 mM NaHCO 3 (pH 9.6) for 16 h at 4°C. The wells were blocked by adding 400 l of 1 ϫ IB buffer (110 mM KCl, 5 mM NaHCO 3 , 5 mM MgCl 2 , 1 mM EGTA, 0.1 mM CaCl 2 , 20 mM Hepes, pH 7.4) containing 1% bovine serum albumin for 1 h at room temperature with shaking (Plateform rocker, Bio-Line), and then washed two times with IB buffer, with the second wash including an incubation for 1 h at room temperature with shaking. Serial dilutions of Imp proteins were then added in IB buffer containing 1 mM dithiothreitol and 1% bovine serum albumin to the microtiter plates and incubated for 16 h at 4°C with shaking. After extensive washing with IB buffer containing 1 mM dithiothreitol and 1% bovine serum albumin, anti-GST antibody (500 ng/well, Amersham Biosciences) was added, and the plates were incubated for 3 h at room temperature with shaking. Non-specifically bound antibodies were removed by washing 10 times with phosphatebuffered saline buffer containing 0.3% Tween 20 (Bio-Rad). Alkaline-phosphatase-conjugated rabbit anti-goat IgG (0.025 unit of enzyme/well, Sigma) was then added, and the plates were incubated for 1 h at room temperature with shaking, and washing was performed as described for the primary antibody. Binding activity was then determined by adding the chromo- genic substrate para-nitropheyl phosphate (1 mg/ml, Sigma) dissolved in 10% diethanolamine and 0.5 mM MgCl 2 (pH 9.8). The change of absorbance at 405 nm was measured over 90 min using a plate reader (Molecular Devices, Menlo Park, CA).
Processing of the ELISA data was carried out using Microsoft Excel software (40,41). Binding affinities were evaluated by plotting the corrected absorbance for different concentrations against time using KaleidaGraph 2.13 software. The data in the linear range was used to obtain the absorbance change per min (A/min), which was plotted against the concentration of the protein. Curves obtained were fitted using Equation 1, where x is the concentration of the protein, B is the level of protein bound, and k is the dissociation constant representing the concentration of the protein yielding half-maximal binding.
Nuclear Import Assays-A method used to mechanically perforate renal epithelial cells (42) has been adapted to cells of the rat hepatoma tissue culture line, a derivative of the Morris hepatoma 7288C cell line (43) (Flow Laboratories, Bonn, Germany). Cells were cultured in Dulbecco's modified Eagle's medium (ICN Biomedical, Inc.) supplemented with 10% heatinactivated fetal calf serum (from CSL Ltd.) in a humidified incubator (Forma Scientific, Inc.) with 5% CO 2 atmosphere at 37°C (43,44). The cells were grown in 50-ml flasks (Nunclon, Denmark), and subcultured every 3-4 days at 90 -100% confluence in a laminar flow hood (Airpure, Westinghouse Pty. Ltd., Australia). Hepatoma tissue culture cells for in vitro import assays were seeded on coverslips (15 mm x 15 mm) in 12-well culture plates (Nunclon) with Dulbecco's modified Eagle's medium and grown for 36 h to 95% confluence. The Dulbecco's modified Eagle's medium was then replaced with 1 ml of Dulbecco's modified Eagle's medium without phenol red and glutamine (ICN) but containing 1 M Hepes plus 10% fetal calf serum, prior to use for analysis of nuclear import. Coverslips were rinsed with IB buffer and excess liquid drained. To mechanically perforate cells, and a single layer of tissue paper was placed onto the cell monolayer and rapidly removed 5 s later to mechanically perforate cells. Cells were washed once more with IB buffer to remove residual cytosolic material, and the coverslip was then placed with the cell-bearing surface face down onto a 5-l drop of nuclear import mix containing an ATP-regenerating system (0.125 mg/ml creatine kinase, 30 mM creatine phosphate, 2 mM ATP, and 2 mM GTP), 30 mg/ml bovine serum albumin, and transport substrate (APC-labeled fusion or fst-labeled protein) and Texas Red-labeled dextran (Sigma). Where appropriate, 4 M RanGDP and 1 M Imp␣⅐Imp␤ were added. Imaging was performed using a Bio-Rad MRC-600 confocal laser scanning microscope with a 60ϫ magnification oil immersion lens (Nikon) on a confocal laser scanning microscope, with images being stored every 2-4 min up to 30 min. Fluorescein-5-isothiocyanate-labeled dextran or Texas Red labeled-dextran (Sigma, 70-kDa molecular mass) were used to monitor nuclear membrane integrity.
Crystallization and Crystal Structure Determination-For crystallization, mImp␣2 70 -529 was concentrated to 19 mg/ml using a Centricon-30 (Millipore) and stored at Ϫ20°C. The crystals were obtained using cocrystallization, by combining 1 l of protein solution, 0.5 l of peptide solution (peptide/protein molar ratio of 4), and 1 l of reservoir solution on a coverslip and suspending over 0.5 ml of reservoir solution. Single crystals were obtained with a reservoir solution containing 0.65-0.70 M sodium citrate (pH 6.0) and 10 mM dithiothreitol after 15-20 days. X-ray diffraction data were collected using a wavelength of 1.46 Å at a synchrotron-radiation source (Laboratório Nacional de Luz Sincrotron, Campinas, Brazil) with a MAR charge-coupled device imaging-plate detector (MAR Research). A crystal was mounted in nylon loops and transiently soaked in reservoir solution supplemented with 25% of glycerol, and flash-cooled at 100 K in a nitrogen stream (Oxford Cryosystems). The data were processed using the HKL2000 package (45) ( Table 2). The crystal presented orthorhombic symmetry (space group P2 1 2 1 2 1 ) and was isomorphous with other mImp␣2 70 -529 ⅐peptide complexes (a ϭ 78.86 Å, b ϭ 90.90 Å, c ϭ 100.97 Å). The structure of the complex with the bipartite cNLS from the Xenopus laevis phosphoprotein N1N2 (PDB ID 1PJN (12)) with the peptide omitted was used as starting model for crystallographic refinement. After rigid body refinement using the program Refmac (46), electron density maps were inspected confirming the presence of the peptide not only in the major site but also in the minor binding site. The models were improved, as judged by the free R-factor, through rounds of crystallographic refinement (positional and restrained isotropic individual B-factor with an overall anisotropic temperature factor and bulk solvent correction) and manual modeling using the program Coot (47). The final model comprises 427 residues (71-497), two peptide ligands (9 residues could be modeled in the major site, and 6 residues in the minor site), 103 water molecules, and a citrate ion ( Table 2). The quality of the model was checked with the programs PROCHECK (48) and MOLPROBITY (49), and the contacts were analyzed with the program LIGPLOT (50). The atomic coordinates and the structure factors have been deposited to the RCSB Protein Data Bank (www.rcsb.org/pdb/) as ID 3L3Q.

RESULTS
Oriented Peptide Library Approach Reveals the Optimal Sequences Binding to the Major cNLS-binding Site of Importin-␣-We adapted an oriented degenerate peptide library approach to study the specificity of binding to the major cNLS-binding site of Imp␣. This approach has been first used to determine the sequence requirements for binding of phosphopeptides to SH2 domains (35) and subsequently used successfully for a number of applications in a variety of biological systems. The key to using this method is that all peptides bind in the same register. To circumvent the problem that cNLS-like sequences contain a number of positively charged residues that could bind in different registers (11), we added specific flanking sequences on each side of the degenerate positions and kept a specific critical Lys residue fixed. The chosen library contained the sequence GSEFESPXKXXXXEA, with five degenerate positions (X) ( Table 1). A Lys at the second position (P2) of the cNLS binding cleft on Imp␣ has been shown to be the most important determinant of a monopartite cNLS, its substitution destroying the function of the cNLS from simian virus SV40 large tumor antigen (T-ag) (12,24,51). The flanking sequences were taken from the sequences used in an alaninescanning mutagenesis study (24), where all mutants were established to bind in the same register. Although a monopartite cNLS can bind at both the major and minor sites (11), we did not expect this to complicate matters because of the lower relative affinity of binding of monopartite cNLSs to the minor site, and because the flanking sequences should reduce binding to this site.
A degenerate library of ϳ2 million distinct peptides of identical length was incubated with Imp␣ proteins, the bound peptides separated from the bulk of unbound peptides, and the mixture sequenced by Edman degradation. The abundance of amino acids at each position of the bound peptides was compared with the abundance at the same position in the starting mixture. A single screening experiment therefore yields the order of preference for every amino acid at a given degenerate position, and the comparison of enrichment values between , was used to screen binding to GST-mImp␣2 70 -529 bound to beads. Each panel shows the relative abundance of each of the 18 amino acids in a given cycle of sequencing of eluted peptides, compared with control GST beads in the same cycle. B, same as in A, but for hImp␣1 67-503 . C, same as in A, but for hImp␣5 60 -511 .  We used N-terminally truncated Imp␣ proteins lacking the IBB domain for peptide library screening. The IBB domain has an autoinhibitory function (52) that is reversed by binding to Imp␤1, but all evidence suggests that the truncated proteins closely resemble the Imp␣⅐Imp␤1 complex in terms of cNLS binding (53).
Using mImp␣2 70 -529 to screen the peptide library yielded the optimal sequence FKKKRR for the degenerate positions 8 -13 in the library (the Lys at position 9 in bold is not degenerate (Fig. 1A and Table  3)). Hydrophobic and positively charged residues were preferred at position 8, with Phe preferred 1.5fold over Ala and Pro in the second and third place in order of preference. Also strongly enriched were Met, Arg, and Lys. Lys was the only strongly enriched amino acid at position 10, with ϳ2-fold preference over other enriched amino acids such as Arg, Pro, Met, Val, and Ala. Lys was also the preferred amino acid at position 11, but followed closely by Arg and Met. Arg was strongly enriched at position 12, preferred almost 2-fold over Met, Ala, Lys, and Phe, with Arg also the optimal amino acid at position 13 (preferred ϳ1.5-fold over Ala, Met, Lys, and Tyr).
A similar sequence GKKKRK was obtained for hImp␣1 67-503 ( Fig. 1B and Table 3), consistent with the high similarity of the mImp␣2 and hImp␣1 proteins (95% sequence identity). Gly was the only strongly preferred amino acid at position 8, followed by Lys and Arg. Lys was the optimal residue at position 10, followed by Gly and Asn. Lys was very strongly enriched at position 11, preferred over 2-fold over Asn and Arg. Arg and Lys were significantly enriched at position 12, Arg was preferred ϳ1.3-fold over Lys. Lys was slightly (1.2-fold) preferred over Arg at position 13.
hImp␣5 60 -511 , representing a different Imp␣ subfamily, showed no preference at position 8 but a similar pattern to the other two proteins tested at positions 10 -13, yielding the optimal pattern XKKRKK ( Fig. 1C and Table 3). Lys was strongly enriched at position 10, followed by Pro and Met; Arg was slightly preferred over Phe and Lys at position 11; Lys and Arg nearly comparable in preference at position 12; and Lys was again preferred followed by Arg and Met at position 13. 70 -529 Complex-To establish the structural basis of binding of the peptide-library derived sequences to Imp␣, we cocrystallized mImp␣2 70 -529 with the pepTM peptide (corresponding to the optimal sequence for binding to mImp␣2 70 -529 (Tables 1 and 2)). The structure FIGURE 2. Crystal structure of pepTM-mImp␣2 70 -529 complex. A, overall structure of the pepTM⅐mImp␣2 70 -529 complex. mImp␣2 70 -529 is shown as a ribbon diagram (pink and yellow colors correspond to the residues defining the major and minor cNLS-binding sites, respectively). The superhelical axis of the repetitive part of the molecule is approximately horizontal. The two cNLS peptides are shown in a stick representation; the peptide bound to the major site is colored cyan, and the peptide bound to the minor site is colored green. B, stereoview of the electron density (drawn with the program PyMOL) in the region of the pepTM peptide bound to the major binding site of mImp␣ 70 -529 . All peptide residues were omitted from the model, and simulated annealing was run with the starting temperature of 1000 K. The electron density map was calculated with coefficients 3͉F obs ͉ Ϫ 2͉F calc ͉ and data between 40 and 2.3 Å resolution, and contoured at 1.2 standard deviations. Superimposed is the refined model of the peptide. C, schematic diagram of the interactions between the pepTM peptide and the major binding site of mImp␣ 70 -529 . Polar contacts are shown with dashed lines, and hydrophobic contacts are indicated by arcs with radiating spokes. The cNLS peptide residues are labeled with "R." Carbon, nitrogen, and oxygen atoms are shown in black, white, and gray, respectively; prepared with the program LIGPLOT. D, superposition of the peptide in the major cNLS binding site in the pepTM-(cyan) and T-ag-(yellow) mImp␣ 70 -529 (PDB ID 1EJL) complexes. The Ca atoms of mImp␣ 70 -529 in the two complex structures were used in the superposition.  Table 4 for pooled data. Top row, pepTM; middle row, pepTH1; and bottom row, pepTH5.

Structure of pepTM-mImp␣2
( Fig. 2) is similar in all respects to the Imp␣ complexes with monopartite cNLS-like peptides reported previously, particularly the T-ag cNLS (11). (i) The structure of mImp␣2 70 -529 comprises 10 ARM repeats, each consisting of three ␣-helices connected by loops ( Fig. 2A, after superposition with mImp␣2 70 -529 from the T-ag cNLS peptide complex, Protein Data Bank (PDB) ID 1EJL, the root mean square distance for 406 C␣ atoms is 0.40 Å). (ii) The electron density maps revealed two peptide ligands binding to one protein molecule in the major (Fig. 2B) and minor cNLS binding sites. Nine and six peptide residues could be modeled in the major and minor binding site, respectively. (iii) The peptides bind in an extended conformation, with their main chain running anti-parallel to the direction of the ARM repeat superhelix, as observed in other cNLS peptide complexes. (iv) The B-factors suggest the peptide binds with higher affinity in the major binding site (57.1 Å 2 ) than the minor binding site (60.0 Å 2 ), as observed for other monopartite cNLS peptides. (v) As observed in other cNLS peptide complexes, most of the hydrogen bonds between the ligand and the protein involve the peptide main chain and the Imp␣ side chains (Fig. 2C). The most favorable contacts involving the peptide side chains occur in positions P2 and P5. Lys in the P2 position has lower B-factors (47.2 Å 2 ) than the rest of the peptide. The side chain of Lys in the P3 position is also well ordered, but the side chain of Lys in the P4 position has poorer electron density, suggesting it forms less favorable interactions. Most importantly, the structure establishes that position 9 in the library FIGURE 4. Imp␣-binding peptides are functional as NLSs in in vitro nuclear import assays. Nuclear import was reconstituted in perforated hepatoma tissue culture cells in the presence of an ATP-regenerating system containing GTP, GDP, and GDP-loaded Ran. Nuclear accumulation of pepTH1-and pepTH5-coupled streptavidin (fst) in the presence of hImp␣5⅐hImp␤1 was followed over time by using a confocal laser scanning microscope. A, confocal laser scanning microscope images are shown for the indicated times at room temperature in the presence of RanGDP. B, results from image analysis are shown, where each data point represents at least five separate measurements for each of Fn, Fc, and background fluorescence. Data are fitted to the function Fn/c ϭ Fn/c max (1 Ϫ e Ϫkt ), where Fn/c max is the maximal level of nuclear accumulation, k is the rate constant, and t is time in minutes. Filled squares, pepTH5-fst; open circles, pepTH1-fst; and filled triangles, fst. Pooled data are presented in Table 5. (2) 28.9 ϩ 5.9 (2) a Data represent the mean Ϯ S.E. (n in parentheses) for the apparent dissociation constant (K d ). LB, low binding; K d could not be determined. corresponds to position P1. This defines the register to interpret the peptide library results.
In the minor binding site, the positions P1Ј and P2Ј are occupied by Lys and Arg, respectively, which appears to be the optimal combination (12). The Arg in particular forms favorable interactions through hydrogen bonds with Ser-360 and Glu-396.
The binding affinities of the peptides to the full-length Imp␣ proteins and their Imp␤1 complexes were also measured ( Fig. 3 and Table 4). As expected, the peptide affinities for Imp␣⅐Imp␤1 complexes were comparable to the corresponding truncated Imp␣ proteins lacking the IBB domain, and the peptide affinities for full-length Imp␣ proteins were at least 3-fold lower than for the corresponding Imp␣⅐Imp␤1 complexes or the truncated Imp␣ proteins lacking the IBB domain.
Additionally, the interaction of the peptides with hImp␣3⅐hImp␤1 and hImp␣7⅐hImp␤1 was investigated ( Fig. 3 and Table 4). These proteins showed either comparable or up to 3.5-fold lower affinities for the peptides, as compared with the hImp␤1 complexes with the Imp␣ proteins the sequences were originally selected against.
The Peptide Library-derived Sequences Are Functional NLSs-We used an in vitro nuclear import assay to assess the ability of the peptide library-selected Imp␣-binding sequences to act as NLSs. For this purpose, the peptides were cross-linked to Alexa Fluor 488-streptavidin (fst). hImp␣5⅐hImp␤1 was found to be the most efficient import factor for the corresponding sequence fst-pepTH5 (Fig. 4), with the initial nuclear accumulation rate Fn/c/min of 1.0, and the maximum accumulation Fn/c max of ϳ2.0 (Table 5). Interestingly, hImp␣5⅐hImp␤1 did not import fst-pepTH1 with high efficiency. The importin dependence of nuclear import was indicated by the fact that no nuclear accumulation of the various fst-peptide derivatives was observed in the absence of exogenously added importins. Also, fst itself, lacking the Imp␣-binding sequence, did not accumulate to any significant extent.
The results for the transport efficiency for the various peptides generally appeared to follow the importin binding affinity (compare the results in Tables 4 and 5). Fig. 5 represents a double-reciprocal plot of selected data for the binding affinity and initial transport rate, highlighting the data for pepTH5 and various Imps, and pepTH1 and Imp␣5 by way of comparison. As can be seen, there is a direct correlation with the highest binding affinity (that for pepTH5 and Imp␣5/␤1) corresponding with the fastest initial transport rate (for the same combination). This is strong evidence that the Imp-binding affinity is a limiting step in the nuclear import process (41,54,55).

DISCUSSION
The principal sequence determinants of cNLSs and the molecular basis of their action have been known for some time, but understanding of the details of the specificity of binding to the receptor Imp␣ continues to be limited, meaning that the predictive identification of putative cNLSs in novel sequences  Tables 4 and 5 for the binding affinity (nanomolar) (y axis) and initial nuclear import rate (Fn/c init ) in minutes (x axis); the initial import rate value was determined after subtraction of the import rate for fst alone (no NLS). The line represents linear regression (r ϭ 0.8925). Oriented Peptide Library Binding to Importin-␣ remains difficult. Here we used an oriented degenerate peptide library approach for the first time to probe the specificity of peptide binding in the major cNLS-binding site in Imp␣. This approach has been previously used successfully to study peptide binding and modification in a number of biological systems, for example to characterize the binding specificities of SH2 domains (35), PTB domains (56), and 14-3-3 proteins (37), and to determine the specificities of phosphorylation by protein kinases (36) and the specificities of cleavage by proteases (57). An important requirement for this approach is that all peptides bind in the same register, which is established, for example, by the presence of a phosphorylated residue in a fixed position in phosphopeptides when analyzing the binding to phosphopeptide-binding domains. An important limitation of the approach is that the contribution of each residue to overall binding is considered independent, i.e. the presence of a particular amino acid at one position is assumed not to influence the specificity at a neighboring position. Although this assumption is likely incorrect when considering a single isolated peptide, it appears to function well in a positional scanning context and has been validated in the context of peptide binding by major histocompatibility complex molecules, for example (58).
A second potential limitation revolves around the ability of peptides to bind in different registers to Imp␣. We attempted to achieve binding in the same register by (i) fixing a critical Lys residue in the sequence (12,24,51) and (ii) adding specific flanking sequences based on the alanine-scanning mutagenesis study (24), which established that all mutants containing these flanking sequences bound in the same register. The crystal structure of the pepTM⅐mImp␣2 70 -529 complex established that the fixed Lys at position 9 of the library corresponded to position P1, according to the established nomenclature (12). Because all three library experiments identified Lys residues as preferred residues at positions 9 and 10, it was reasonable to assume that the peptides bind in the same register as observed for pepTM in the crystal structure ( Table 6).
The sequences selected by the peptide library screening are very similar to already characterized cNLSs, particularly in being rich in positively charged amino acids Lys and Arg ( Table  6). The optimal consensus sequences for positions P2-P5 has been previously suggested as K(R/K)X(R/K) (59), K(K/ R)X(K/R) (24), KR(R/X)K (12), KRRR (25), and KR(K/R)R or K(K/R)RK (17), all of which are very similar to the sequences selected in our peptide library experiments. There appears to be good agreement that Lys is strongly preferred in the P2 position. The discrimination between Lys and Arg in positions P3-P5 may not be strong, and the preference may indeed depend on the surrounding amino acids in a cNLS and the specific Imp␣ protein used. It is interesting that Arg is selected in our library experiments for hImp␣5 in the P3 position, whereas Lys is preferred in hImp␣1 and mImp␣2. hImp␣5 contains an Asp at the position equivalent to Glu-266 in mImp␣2 and hImp␣1. Crystal structures suggest that this negatively charged residue may be central in determining the preference for a basic side chain in the cNLS at this position. The shorter Asp side chain in hImp␣5 could allow more favorable binding of the longer Arg side chain at this position, as suggested by the peptide library experiments. Interestingly, hImp␣3 and hImp␣7 contain Ser and Asn at the analogous position, respectively, implying that they may display further differences in preference at this position. Nevertheless, Arg appeared to be favored over Lys in the P3 position according to the Ala mutagenesis in the context of c-Myc and T-ag cNLSs, using Saccharomyces cerevisiae Imp␣ (24); this protein contains a Glu at the residue equivalent to mImp␣2 Glu-266. Overall, the sequences selected by hImp␣1 and hImp␣5, representing different subfamilies of Imp␣, are very similar, indicating that any functional differences between these two paralogs of Imp␣ are unlikely to result from differences in recognition specificity in the major cNLS-binding site.
T-ag differs only in 0 -2 positions from our library-selected sequences, suggesting that T-ag is already optimized as a monopartite cNLS in terms of these key positions. The implication is that higher affinities can therefore only be achieved by forming favorable interactions using other positions reaching outside the major cNLS binding site (13,25).
Nuclear import assays show that the sequences selected by the oriented peptide library experiments are functional as NLSs. It has been suggested that there are upper and lower thresholds in terms of binding energy to Imp␣ for functional NLSs (24). The initial rate of protein import has been suggested to be linearly correlated with the cNLS⅐Imp␣ affinity (41,54,55). However, when the affinity becomes too high, the cargo would not be able to be released. Our library experiments did not exceed such as affinity threshold, presumably because we have optimized only a few interactions in the major cNLS binding site.
In summary, we have successfully used an oriented peptide library approach to find optimal sequences for binding to the major cNLS binding site in Imp␣. Similar sequences consistent with features of known cNLSs were selected for mouse Imp␣2, human Imp␣1, and human Imp␣5. The crystal structure of the peptide⅐mImp␣2 complex showed the expected binding mode, binding assays demonstrating that these peptides bound to Imp␣ proteins with high affinities, and nuclear import assays showed that the sequences acted as functional NLSs. The results contribute to our understanding of the sequence determinants of cNLSs and may help identify cNLSs in novel protein sequences.