Two Independent Nuclear Localization Signals Are Present in the DNA-binding High-mobility Group Domains of SRY and SOX9*

SRY and SOX9, members of the family of high-mobility group (HMG) domain transcription factors, are both essential for testis formation during human embryonic development. The HMG domain is a DNA-binding and DNA-bending motif comprising about 80 amino acid residues. It has been shown that SRY and SOX9 are nuclear proteins. Using normal or mutant SRY-β-galactosidase and SOX9-β-galactosidase fusion proteins in transfection studies involving COS-7 cells, we have identified two nuclear localization signals (NLSs) within the HMG domains of both proteins that can independently direct the fusion proteins into the nucleus. Only mutational inactivation of both NLS motifs resulted in complete exclusion of the fusion proteins from the nucleus. The NLS sequences are located at the N and C termini of the HMG domain and are a bipartite NLS motif and a basic cluster NLS motif, respectively. Both NLS motifs are conserved in the HMG domains of other transcription factors. The implications of the present results are discussed regarding (a) the apparent dual function of certain basic amino acid residues in the HMG domain of SRY in both DNA binding and in nuclear localization and (b) the possible control of SOX9 in early gonadal differentiation at the level of nuclear translocation.

Mammalian sex determination and early gonadal differentiation is a developmental process involving a cascade of regulatory gene interactions. Only a few of these genes, all encoding transcription factors, are known (reviewed in Ref. 1), among them the related genes SRY and SOX9. SRY encodes the Ychromosomal testis-determining factor as shown by XY sex reversal in human individuals mutant for SRY (2,3) and by the demonstration of testis formation in chromosomally female mice transgenic for mouse Sry (4). SOX9 on chromosome 17 is an autosomal gene essential for testis development as mutations in and around this gene cause XY sex reversal in association with the skeletal malformation syndrome campomelic dysplasia (5,6).
Both SRY and SOX9 contain an 80-amino acid DNA-binding motif known as the high-mobility group (HMG) 1 domain that characterizes a whole class of transcription factors (reviewed in Ref. 7). SRY binds to the sequence AACAAT and variants thereof (8) and induces a sharp bend in the DNA (9). The three-dimensional solution structure of the SRY HMG domain complexed with its target sequence has been solved (10), as has a similar complex of the related factor LEF-1 (11). In cell transfection studies, some evidence for transcriptional activation of testis-specific genes by SRY has been presented (12). We have shown in similar transfection assays that SOX9 also functions as a transcription factor, contains a C-terminal transactivation domain (13) and can bind via its HMG domain to the motif AACAAT (14). Recently, mouse Sox9 was found to be expressed in the gonadal anlage of both sexes, with expression increasing in the developing testis and decreasing in the developing ovary, consistent with a role for SOX9/Sox9 in Sertoli cell differentiation (15,16).
As transcription factors, SRY and SOX9 must gain access to the nucleus. Studies on nuclear localization indicate that transport across the nuclear envelope is an active process mediated by one or more nuclear localization signal sequences (NLSs), usually present in the protein itself or in a cofactor (for review, see Refs. 17 and 18). With some exceptions, two main types of NLS motifs exist. One is a short cluster of mainly basic amino acids (arginine and/or lysine), its prototype found in the simian virus 40 large tumor antigen (19). The other is a bipartite NLS motif that comprises two basic amino acids, a spacer of about 10 -15 residues consisting of any amino acid, followed by generally three basic residues, as first described for nucleoplasmin (17). Specialized NLS-binding transporter proteins that carry NLS-containing proteins through the nuclear pore complex into the nucleus have been identified recently (20).
Karyophilic NLS sequences are generally identified by their ability to direct an otherwise cytoplasmic protein to the nucleus when fused to it genetically or biochemically and by the effects of deletion or point mutations on nuclear entry (18). Using these approaches with ␤-galactosidase as a reporter protein, we have identified two independent NLS motifs within the DNAbinding HMG domains of SRY and SOX9 that are both required for complete nuclear translocation. In a previous study, only one of the two NLSs had been identified in SRY (21).

EXPERIMENTAL PROCEDURES
Fusion of SRY and SOX9 to ␤-Galactosidase-The complete open reading frame of SRY was amplified from genomic DNA using Pfu polymerase (Stratagene). The 5Ј-primer, 5Ј-GCGGAATTCTTTTTGA-CAATGCAATCA-3Ј, spanned the SRY start codon (underlined) and included an EcoRI site for cloning. The 3Ј-primer, 5Ј-CGCGAATTCTC-GAGCTACTTGTCATCGTCGTCCTTGTAGTCCAGCTTTGTCCAGTG-GCT-3Ј, spanned 18 nucleotides immediately upstream of the stop codon of SRY, followed by 24 nucleotides coding for the FLAG epitope DYKDDDDK, a new stop codon (underlined), and an XhoI site for cloning. Following digestion with EcoRI and XhoI, the PCR fragment was inserted into expression vector pcDNA3 (Invitrogen) cut with the same enzymes, resulting in clone SRY-C-FLAG. Subsequently, a lacZ gene fragment coding for a functional ␤-galactosidase but lacking the first six nonessential codons of lacZ was derived from plasmid LZE1 (a gift of K. Schughart (Transgene, Strasbourg, France)) by digestion with PstI and XhoI. This fragment was fused in frame to SRY within construct SRY-C-FLAG, using the PstI site of SRY at codon 154. The resulting SRY-␤-galactosidase fusion construct was designated SRY-WT. The HMG domain of SOX9 was amplified by PCR from cDNA clone 4.1 (5). The 5Ј-primer, 5Ј-GCGGGATCCGCCAGCCATGCACGTCAAG-CGGCCCATGAAC-3Ј, contained a new start codon (underlined) preceded by a Kozak sequence and followed by 21 nucleotides from the 5Ј-end of the HMG box, and a BamHI site for cloning. The 3Ј-primer, 5Ј-CCGGAATTCTCTGCCTCCGCCTGCCC-3Ј, spanned 18 nucleotides corresponding to codons 185-190, followed by an EcoRI cloning site. Following digestion with BamHI and EcoRI, the PCR product was ligated in frame to the lacZ gene fragment from plasmid LZE1, resulting in pcDNA3-expression construct SOX9-WT. For constructs SRY-biNLS, SRY-bcNLS, SOX9-biNLS, and SOX9-bcNLS, double strand oligonucleotides encoding the bipartite or basic cluster NLS sequences (biNLS/bcNLS) of SRY and SOX9, respectively, and also containing a new Kozak sequence, start codon and BamHI and EcoRI restriction sites were ligated in frame to the lacZ gene fragment. All constructs were verified by restriction and DNA sequence analysis.
Mutagenesis and NLS Deletion Constructs-To create amino acid changes, fusion constructs SRY-WT and SOX9-WT were subjected to site-directed mutagenesis using the Chameleon kit from Stratagene following the manufacturer's instructions. Deletions were introduced by PCR using primers that allowed amplification of only parts of the HMG boxes as depicted in Figs. 1 and 2, followed by direct fusion to ␤galactosidase as mentioned above. All constructs were verified as above.
Cell Culture and Transient DNA Transfection-COS-7 cells were grown in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum. For DNA transfection, 1 ϫ 10 6 cells were incubated in suspension for 1 h with 7 g (1 pmol) of plasmid DNA constructs by the DEAE-dextran method (22).
Histochemical Staining-For histochemical staining, transfected cells were cultivated for 42 h and subsequently stained for ␤-galactosidase activity by incubation with X-gal (0.3 mg/ml) as described (23). The reaction was carried out for 2-6 h at 30°C. Between 20 and 40 X-gal-positive cells were assessed. Microscopy and photography were performed using Zeiss equipment (Axiovert).

SRY Contains Two Independent NLS Motifs within Its HMG
Domain-We have first investigated the subcellular localization of the transcription factors SRY and SOX9 by indirect immunofluorescence and found both proteins exclusively located in the nuclei of transiently transfected COS-7 cells (data not shown), in agreement with previous findings (15,21). To identify functional NLSs that direct SRY to the nucleus, we drew our attention to the DNA-binding HMG domain of SRY. This appears to be the only functionally relevant part in SRY as only this domain of 80 amino acids is evolutionarily well conserved (24) and contains nearly all known SRY mutations found in female patients with XY gonadal dysgenesis. Moreover, functional NLSs are frequently found within DNA-binding domains (23,25). By comparison with reported NLS consensus sequences (17)(18)(19), we identified a candidate bipartite NLS motif in the N terminus of the HMG domain and a candidate basic cluster NLS sequence in its C terminus. Furthermore, both sequence motifs are well conserved in many HMG domain transcription factors (see Fig. 3 and "Discussion").
Applying a previously reported approach (18, 23), we genetically fused the open reading frame of SRY from codon 1 to 154 including the HMG box to the open reading frame of the ␤galactosidase gene using the eukaryotic expression vector pcDNA3. After transient expression of the resulting construct SRY-WT in COS-7 cells, subcellular localization of the SRY-␤galactosidase fusion protein was visualized by 5-bromo-4chloro-3-indolyl-␤-D-galactoside-mediated blue staining for ␤galactosidase activity. In this and in all subsequent experiments, 20 -40 blue cells were monitored for the subcellular distribution of the blue color, with only minor variations observed between individual cells or transfections. As shown in Fig. 1, the staining was exclusively confined to the cell nucleus. In a control experiment, transiently expressed ␤-galactosidase alone showed only cytoplasmic staining, whereas transfection of the empty expression vector resulted in no blue staining of cells (not shown). Replacing the basic amino acid residues at positions 61 and 62 and 75-77 by aliphatic residues in constructs SRY-M1 and SRY-M2, respectively, resulted in fusion proteins with reduced nuclear translocation as judged by both cytoplasmic and nuclear staining. The same held true for a mutation of all five above-mentioned basic amino acids in construct SRY-M3 (Fig. 1). These data suggest the presence of a bipartite NLS sequence in the N-terminal part of the HMG domain of SRY.
The observation of some remaining nuclear staining in construct SRY-M3 indicates that a second NLS is present in SRY. We therefore deleted the C-terminal part of the HMG domain beyond residues 114 and 125 in constructs SRY-D1 and SRY-D2, respectively. These constructs, which leave the N-terminal part of the HMG domain intact, resulted in fusion proteins with cytoplasmic and nuclear localization, as did construct SRY-D3 where only the short stretch RPRRK (residues 130 -134) containing mostly basic residues was deleted (Fig. 1). These dele- tion constructs indicate that the basic cluster at the C terminus of the HMG domain comprises a second NLS.
When the C-terminal deletion beyond residue 114 was combined with the fully mutated bipartite NLS at the N terminus of the HMG domain in construct SRY-D1-M3, the resulting protein was now completely confined to the cytoplasm, as indicated in Fig. 1. Exclusively cytoplasmic staining was also obtained when the D3 deletion was combined with a partially or fully mutated bipartite NLS in constructs SRY-D3-M1, SRY-D3-M2, or SRY-D3-M3 (Fig. 1). These results confirm that residues 61-62 and 75-77 constitute a bipartite NLS that functions independently of the basic cluster NLS. They also show that for complete nuclear import, both NLS motifs are required. (It should be noted that the region 138 -154 not present in the D3 constructs contains only a single basic residue at position 140.) To demonstrate that the bipartite and the basic cluster NLS of SRY, when analyzed separately and outside the context of the HMG domain, are sufficient to at least partly direct a reporter protein into the nucleus, we fused these motifs alone to lacZ. As depicted in Fig. 1, these constructs SRY-biNLS and SRY-bcNLS resulted in fusion proteins with nuclear and also cytoplasmic localization.
Taken together, these data identify a typical (17) bipartite NLS sequence KRPMNAFIVWSRDQRRK (residues 61-77) in the N-terminal part of the HMG domain of SRY, consisting of two short basic parts (underlined) separated by 12 amino acids, and a second NLS RPRRK (residues 130 -134) in the C terminus of the HMG domain of SRY comprising four basic residues plus a proline residue. Basic cluster NLSs in other proteins are also frequently associated with one or more proline residues (18,19).
The replacement of a single arginine residue by glycine at position 62 in SRY, reported in a female patient with XY gonadal dysgenesis (26), was not found to affect nuclear localization (SRY-R62G, Fig. 1). In contrast, a change of arginine to leucine at position 76 in construct SRY-R76L resulted in partial cytoplasmic retention of the SRY fusion protein. This mutation has not yet been described in XY sex-reversed patients.
The HMG Domain of SOX9 Also Contains Two NLS Motifs-As shown in Fig. 3, the N-and C-terminal NLS motifs identified in the HMG domain of SRY are well conserved in other HMG domain-containing transcription factors. As a second example of this class of proteins, we examined SOX9, which shares 50% amino acid sequence identity in its HMG domain with that of SRY (5, 6). Fusion proteins containing ␤-galactosidase were constructed as above using the 80-amino acid HMG domain of SOX9 plus eight C-terminal residues. As shown in Fig. 2, construct SOX9-WT containing the wild-type HMG domain resulted in exclusively nuclear staining of transiently transfected COS-7 cells. Substitution of the basic residues at position 106, 107, and 120 -122 by aliphatic residues in construct SOX9-M3 led to nuclear as well as cytoplasmic localization of the resulting fusion protein (Fig. 2). A similar subcellular distribution resulted from deletion of the C terminus of the HMG domain after residue 158 in construct SOX9-D1 or from deletion of only the five mainly basic residues PRRRK (residue 176 -180) in construct SOX9-D3.
These data indicate the presence of a bipartite NLS motif at the N terminus and of a basic cluster type NLS motif at the C terminus of the HMG domain of SOX9, very similar to the findings with SRY above. As with SRY, each of these NLS motifs is by itself sufficient, at least partly, to direct the fusion protein to the nucleus, as documented by constructs SOX9-biNLS and SOX9-bcNLS. Complete loss of nuclear translocation is only observed after mutational inactivation of both NLS motifs, as shown by constructs SOX9-D3-M1, SOX9-D3-M2 and SOX9-D3-M3 (Fig. 2). (The eight C-terminal residues 183-190 not present in the D3 constructs contain only a single basic residue at position 183.) DISCUSSION We have observed that SRY and SOX9, two proteins essential for normal testis formation during human embryonic development, are located exclusively in the nuclei of transfected cells, in agreement with previous findings (15,21). A bipartite and a basic cluster NLS motif were identified within the HMG domains of both proteins to be necessary and sufficient for effective nuclear import. Both NLS motifs are conserved in the HMG domains of other members of this family of transcription factors (Fig. 3). This is particularly evident for the subgroup of the SRY HMG-box (domain)-related SOX proteins. It is conceivable that most, if not all, HMG domain transcription factors utilize NLS motifs identical or related to those we have identified here. In fact, for the lymphocyte-specific factor LEF-1, which apparently lacks a bipartite NLS, the extended basic cluster at the C terminus of its HMG domain has been identified as its major NLS (Ref. 30; see Fig. 3). Moreover, the recently identified SOX17 isoform that lacks essentially all of the HMG domain except the C-terminal basic cluster sequence was found to be localized in the nucleus and in the cytoplasm of transfected cells (31). By contrast, the full-length SOX17 with

FIG. 2. The HMG domain of SOX9 contains two NLS motifs.
Top, SOX9-␤-galactosidase fusion constructs used in this study. Names of constructs in expression vector pcDNA3 follow the nomenclature used in Fig. 1 and are given at left. The schematic representations show amino acid substitutions and deletions introduced in the HMG domain. The resulting subcellular localization in transfected COS-7 cells is given as nuclear only (N), cytoplasmic only (C), and both nuclear and cytoplasmic (N/C). Basic amino acids in the two NLS motifs are underlined. The numbering indicates the amino acid residues of SOX9. The methionine residue preceding the HMG domain sequence was introduced during cloning. HMG, HMG domain. Bottom, photomicrographs of transfected COS-7 cells expressing the indicated fusion constructs, stained for ␤-galactosidase activity. Bar indicates 10 m. the complete HMG domain containing a bipartite and a basic cluster NLS motif very similar to those in SRY and SOX9 (Fig.  3) was completely nuclear (31). This is in line with our observation of two independent NLSs in the HMG domain being necessary for complete nuclear translocation.
While this work was in progress, a similar study by Poulat et al. (21) appeared describing the N-terminal bipartite NLS in the HMG domain of SRY. These authors identified the same basic residues as essential components of this motif as the present study. However, in contrast to our findings, they describe nuclear import of SRY to be controlled solely by the bipartite NLS motif. By deleting the 20 N-terminal residues of the HMG domain that span this motif in full-length SRY, the little nuclear staining seen in transfected cells with an SRY antibody was interpreted as passive diffusion of the NLS-deleted protein into the nucleus. The same deletion in the context of an SRY-␤-galactosidase fusion construct was reported, but not shown, to result in localization only in the cytoplasm. In contrast, in the present study, by mutating the relevant residues in both parts of the bipartite NLS but keeping the remainder of the HMG domain unchanged in construct SRY-M3, nuclear translocation was observed to be reduced but clearly not abolished. Only after additional deletion of the C-terminal basic cluster in construct SRY-D3-M3 did we observe complete nuclear exclusion (Fig. 1). Furthermore, when tested individually and outside the context of the HMG domain, not only the bipartite NLS, but also the basic cluster NLS, was able to direct the reporter protein into the nucleus, although not completely, as shown by constructs SRY-biNLS and SRY-bcNLS (Fig. 1). Corresponding results were obtained for SOX9 (Fig. 2). Our data do not allow one to decide as to which of the two NLS motifs is stronger. It is possible that under the experimental conditions used by Poulat et al. (21), the basic cluster NLS functioned as a slightly weaker nuclear localization signal than the bipartite NLS and therefore was not apparent in their study. A weakly functioning basic cluster NLS may also explain why a peptide corresponding to the entire SRY HMG domain but the first 16 residues failed to translocate coupled rabbit IgG to the nucleus in the study of Poulat et al. (21). It has been shown that weak NLS motifs require a high coupling ratio of peptide/IgG to drive the coupled protein into the nucleus (25).
It is unlikely that another functional NLS, in addition to the two motifs in the HMG domain, is present in SRY. The fact that constructs SRY-D3-M1, -M2, and -M3, where these motifs are mutated, are localized only in the cytoplasm, argues against an NLS in the N-terminal 58 residues. The C-terminal 68 residues of SRY that are missing from these constructs contain only six basic residues that are not clustered. Furthermore, the HMG domain is the only conserved region in SRY sequences from different species (24), suggesting that this domain is the only functionally relevant part. However, for SOX9, where we have only analyzed the HMG domain, we cannot fully exclude an additional NLS elsewhere in the 509-residue SOX9 protein, although a typical NLS motif is not apparent from the sequence.
Two interesting features of the NLSs present in the HMG domain of SRY and SOX9 are shared by other nuclear proteins. The first relates to the phenomenon of multiple karyophilic signals. As in SRY and SOX9, other nuclear proteins contain two mutually independent NLSs that have to be inactivated together to completely abolish nuclear import (25,32). Such an arrangement may reduce the susceptibility of a nuclear protein to total loss of nuclear translocation by NLS-inactivating point mutations. The second feature is the location of the NLSs in a DNA-binding domain, a situation described for several other proteins, including a number of transcription factors such as MyoD (25) or NF-B (33). The apparent economic design of incorporating basic nuclear targeting sequences together with frequently basic DNA-binding residues in a single domain may be the result of an evolutionary process if, as has been suggested (17), DNA-binding domains were the archetypal targeting signals during evolution of the nuclear membrane. Interestingly, in the case of SRY, the two arginine residues 62 and 75 in the bipartite NLS (residues 5 and 18 in Fig. 3) contribute to the DNA binding of the HMG domain, both by forming salt bridges to phosphate residues and by hydrophobic interactions with sugar residues, as revealed by the NMR structure of the SRY/DNA complex (10). Likewise, the lysine and arginine residues in LEF-1 that correspond to basic residues 61, 62, and 75 in the bipartite NLS of SRY (residues 4, 5, and 18 in Fig. 3) also interact with the DNA backbone in the LEF-1/DNA complex (11). Since the NMR analysis of the SRY-DNA complex did not clearly resolve the residues of the basic cluster NLS (10), it is not known if some of these residues may also make DNA contacts. This is likely, however, in view of the fact that the C-terminal basic residues in the HMG domain of LEF-1 (see Fig. 3) contact the sugar-phosphate backbone extensively in the LEF-1/DNA complex (11).
It thus appears that certain basic residues in both NLS motifs of the HMG domain serve two functions, playing a role both in nuclear import and in DNA binding. Sex reversal in XY females with amino acid substitutions affecting these basic residues in SRY, or in SOX9, may then result not only from impaired DNA binding but also from impaired nuclear uptake of the mutant protein. Whereas one such published mutation in SRY, R62G (26), showed normal nuclear translocation, another mutation tested, R76L, resulted in reduced nuclear import (Fig. 1). It is likely that this or similar mutations will be described in the future.
There is growing evidence that gene transcription can be controlled by regulating the nuclear import of transcription factors. This can be achieved by interacting proteins masking one or several NLSs present in these factors, thus rendering these signals nonfunctional (33,34). Such a masking mechanism could account for the recent observation that SOX9 is  (7,(27)(28)(29)31). The SOX proteins shown represent all seven subgroups of the SOX family (27,29). Amino acids are given from positions 4 -20 and 72-78/83 of each HMG domain, with basic residues in capital letters. Basic residues making DNA contacts in the SRY-DNA (10) and LEF-1-DNA complex (11) are marked by an asterisk; no structural data are available for residues from the basic cluster of SRY (10). h, human; m, mouse; y, yeast.
localized predominantly in the cytoplasm of cells of the developing gonadal ridge in both sexes prior to day 11.5 of mouse embryogenesis, while it is completely nuclear at later stages of male gonadal development (15). Whatever the nature of this hypothetical masking factor, the role of which may be an essential one in controlling mammalian sex determination, it is likely to interact with one or both NLS motifs in SOX9 identified in this study.