Two Potent Nuclear Localization Signals in the Gut-enriched Krüppel-like Factor Define a Subfamily of Closely Related Krüppel Proteins*

The gut-enriched Krüppel-like factor (GKLF) is a newly identified transcription factor that contains three C2H2 Krüppel-type zinc fingers. Previous immunocytochemical studies indicate that GKLF is exclusively localized to the nucleus. To identify the nuclear localization signal (NLS) within GKLF, cDNA constructs with various deletions in the coding region of GKLF were generated and analyzed by indirect immunofluorescence in transfected COS-1 cells. In addition, constructs fusing regions representing putative NLSs of GKLF to green fluorescent protein (GFP) were generated and examined by fluorescence microscopy in similarly transfected cells. The results indicate that GKLF contains two potent, independent NLSs: one within the zinc fingers and the other in a cluster of basic amino acids (called 5′ basic region) immediately preceding the first zinc finger. In comparison, putative NLSs within the zinc fingers and the 5′ basic region of a related Krüppel protein, zif268/Egr-1, are relatively less efficient in their ability to translocate GFP into the nucleus. A search in the protein sequence data base revealed that despite the existence of numerous Krüppel proteins, only two, the lung Krüppel-like factor (LKLF) and the erythroid Krüppel-like factor (EKLF), exhibit similar NLSs to those of GKLF. These findings indicate that GKLF, LKLF, and EKLF are members of a subfamily of closely related Krüppel proteins.

Various mechanisms that are responsible for nuclear localization of eukaryotic transcription factors have been proposed. Most transcription factors contain one or more nuclear localization signal (NLS), 1 which, when recognized by nuclear transport proteins, results in the translocation of the transcription factor to the nuclear pore complex. Subsequent translocation across the nuclear membrane occurs in an ATP-dependent fashion (1). By inspecting the amino acid sequences of a large number of transcription factors, two types of NLSs have been defined (2,3). The first type, called a "core" NLS, contains four or more arginine and lysine residues within a hexapeptide and is frequently flanked by acidic residues or "helix-breakers" such as proline and glycine (3). The second type of NLS is "bipartite" and consists of two clusters of basic amino acids separated by a short nonbasic peptide. It is hypothesized that the two clusters of basic amino acids in a bipartite NLS are brought to a juxtaposed position due to protein folding and are subsequently recognized by the nuclear import machinery (2). In an analysis of the sequences of 117 transcription factors, 106 were found to contain one or more core NLS, whereas relatively few contain a bipartite NLS (2,3). Interestingly, many putative NLSs are present in close proximity to the DNA-binding domains of transcription factors, exemplified by the bZIP proteins c-Fos and c-Jun, and the bHLH proteins Myc, Max, and Myo D1 (3). This conserved arrangement seems to suggest that DNA-binding motifs and nuclear localization signals may have coevolved.
We recently identified a novel transcription factor named gut-enriched Krü ppel-like factor (GKLF) which contains three C 2 H 2 Krü ppel-type zinc fingers (5). Expression of GKLF is enriched in the intestinal tract with the highest level of transcript found in the post-mitotic epithelial cells of the colon. In vitro, expression of GKLF is increased in culture conditions that induce growth arrest such as serum deprivation or contact inhibition. Furthermore, enforced expression of GKLF in transfected cells results in the inhibition of DNA synthesis. Together, these results indicate that GKLF is a growth arrestassociated, epithelial-specific gene. Subsequently, GKLF was independently identified by another group, which named it epithelial zinc finger (EZF), and shown also to be expressed at high levels in the epidermal layers of the skin (6). These findings suggest that GKLF/EZF may be involved in growth regulation and perhaps terminal differentiation of specific epithelial tissues. The primary amino acid sequence in the zinc finger region of GKLF exhibits a high degree of identity with several previously identified Krü ppel proteins, including lung Krü ppellike factor (LKLF (7)), erythroid Krü ppel-like factor (EKLF (8)), and basic transcription element-binding protein 2 (BTEB2 (9)). Because of the highly homologous nature of the zinc finger sequences of LKLF, EKLF, and BTEB2, it has been proposed that the three belong to the same multigene family (7).
Our previous studies showed that GKLF localized exclusively to the nucleus of cells transfected with a GKLF-expressing plasmid construct (5). To further investigate the structurefunction relationship of GKLF with regard to nuclear localization, we determined its NLS in the present study. We show that GKLF contains two potent NLSs, each of which is sufficient to direct GKLF or an unrelated polypeptide into the nucleus. One of the NLSs resides in the zinc fingers and the other in a region (called 5Ј basic region) immediately aminoterminal to the first zinc finger. In contrast, by our studies and previous reports, nuclear localization of a related Krü ppel protein, zif268/Egr-1, appears to require the participation of both the 5Ј basic region and the zinc fingers (4,10). Our results suggest that the Krü ppel family of transcription factors can further be divided into subfamilies based on the sequences required for nuclear localization.

EXPERIMENTAL PROCEDURES
DNA Constructs-A GKLF cDNA containing the entire 483 amino acid (aa) open reading frame cloned into the mammalian expression vector, PMT3 (11), was described previously (5). Three mutant constructs with progressive deletions from the 3Ј end of the GKLF coding region were generated by digesting the full-length cDNA with appropriate restriction endonculeases (Fig. 1a). PMT3-GKLF-(1-441) contains the 5Ј basic region (broadly defined as the 20 amino acids (residues 382-401) immediately preceding the first cysteine of the first zinc finger of GKLF) and a deletion of the carboxyl-terminal 1 1 ⁄2 zinc fingers. PMT3-GKLF-(1-401) contains the 5Ј basic region and a deletion of all three zinc fingers. PMT3-GKLF-(1-349) contains a deletion of both the 5Ј basic region and the zinc fingers. In addition, a construct containing only the 5Ј basic region and the three zinc fingers of GKLF was generated (PMT3-GKLF-(350 -483)).
All green fluorescent protein (GFP) fusion proteins were generated in the expression vector, pEGFP-C3 (Clontech Laboratories, Inc.). cDNAs corresponding to peptides of those shown in Fig. 2a were generated by the polymerase chain reaction using appropriate primers and fused to the carboxyl terminus of GFP. All constructs were sequenced to ensure the accuracy of the reading frames and to verify the fidelity of the polymerase chain reaction.
Transfection and Immunocytochemistry-Transient transfections were performed in COS-1 cells by lipofection (Life Technology, Inc.) as described previously (5). The procedure for indirect immunofluorescence analysis of GKLF in transfected cells using a primary polyclonal rabbit antiserum directed against GKLF and fluorescein isothiocyanate-conjugated secondary goat anti-rabbit serum has also been described (5). For visualization of cells transfected with the GFP fusion constructs, cells were fixed and permeabilized in an identical manner to those described for indirect immunofluorescence (5) and visualized with a Zeiss Axioskop 20 microscope equipped for epifluorescence.

RESULTS
Nuclear localization of GKLF was first examined by indirect immunofluorescence of COS-1 cells that had been transiently transfected with the full-length or various deletion constructs of GKLF in PMT3. Fig. 1 shows the results of one such experiment, which is representative of three independent experiments performed. The expressed GKLF protein was found to be present in the nucleus of cells transfected with constructs that retained the 5Ј basic region of GKLF (constructs A, B, C, and E ( Fig. 1)). In contrast, deletion of the 5Ј basic region resulted in a significant distribution of the protein to the cytoplasm (construct D (Fig. 1)). Cells transfected with the empty PMT3 vector showed only minimal background staining (construct F (Fig.  1)). These results indicate that the 5Ј basic region of GKLF is both necessary and sufficient for nuclear localization.
To further delineate the NLS of GKLF, DNA constructs fusing various regions of GKLF to the carboxyl terminus of GFP were generated and analyzed by fluorescence microscopy in transiently transfected COS-1 cells. As seen in Fig. 2, GFP alone was localized throughout the cell (construct F (Fig. 2)). In contrast, the three zinc fingers of GKLF, devoid of the 5Ј basic region, were able to redistribute the GFP fusion protein exclusively to the nucleus (construct A (Fig. 2)). Moreover, not all three zinc fingers were required for nuclear localization, since a construct retaining only the amino-terminal 1 1 ⁄2 fingers also localized to the nucleus (construct B (Fig. 2)). This latter observation is different from that of a previous study involving a GKLF-related protein, zif268/Egr-1, in which deletion of any of its zinc fingers resulted in a loss of nuclear localization (10). Finally, a construct containing only the 5Ј basic region of GKLF was also able to drive the GFP fusion protein into the nucleus (construct C (Fig. 2)). These results indicate that the 5Ј basic region as well as the zinc fingers of GKLF function as potent NLSs and that each is capable of independently translocating a heterologous protein into the nucleus.
The NLS of zif268/Egr-1 has been examined in detail in two previous studies (4,10). While one study suggests that the three zinc fingers of zif268/Egr-1 are sufficient for nuclear localization (10), another study indicates that the 5Ј basic region of zif268/Egr-1 in combination with its zinc fingers are necessary for full localization to the nucleus (4). Because the 5Ј basic region of GKLF alone is a sufficient and strong NLS, we compared the ability of this region of zif268/Egr-1 to that of GKLF to localize GFP to the nucleus. Our results confirmed the previous observations (4, 10) that the 5Ј basic region of zif268/ Egr-1 functions as an NLS (construct D (Fig. 2)). However, the potency of this region to localize GFP to the nucleus appears to be relatively lower than that of the corresponding region of GKLF since in cells transfected with construct D, nuclear fluorescence was relatively weak and cytoplasmic fluorescence could be seen in a fair number of transfected cells when compared with cells transfected with construct C (Fig. 2).
Last, it was previously shown that a point mutation converting an arginine to a glycine in the third zinc finger of zif268/ Egr-1, a region with a core NLS sequence, destroyed its nuclear localization (10). Thus, we analyzed the ability of this putative NLS to direct nuclear localization by the GFP fusion approach. Surprisingly, the results show that despite the presence of a core NLS in this region, the fusion protein was only weakly localized to the nucleus (construct E (Fig. 2)). Combining the results of our study and the two previous reports, it appears that the relative nuclear localizing activity of the 5Ј basic region and the zinc fingers of GKLF are stronger than that of the corresponding regions of zif268/Egr-1 (Figs. 1 and 2; Refs. 4 and 10). DISCUSSION Protein nuclear localization is a relatively new topic in the field of protein transport. In the last decade, significant progress has been made toward the understanding of the mechanisms that mediate localization of proteins to the nucleus. This is in part due to the availability of data bases containing amino acid sequences of a large number of transcription factors. By comparing these sequences, it becomes clear that most transcription factors depend on specific nuclear localization signals to achieve efficient translocation into the nucleus (2, 3). Further investigation of the function of individual NLSs should lead to a better understanding of the mechanisms responsible for nuclear localization. In addition, it is becoming clear that many transcription factors contain more than one NLS and that the rate of nuclear import may be directly related to the number of NLSs present (22). Thus, the process of nuclear localization may reflect yet another level of regulation in transcription factors.
The goal of the present study was to delineate the NLS within a newly identified zinc finger-containing transcription factor, GKLF (5), also known as EZF (6). The results of our study clearly demonstrate that GKLF contains two NLSs, each capable of functioning independently and efficiently to translocate either GKLF or a heterologous protein into the nucleus. One of these NLSs resides in the 5Ј basic region of GKLF, which includes a core NLS sequence (four arginines and lysines within a hexapeptide) from aa residues 385-390 (PKRGRR). The second NLS is located within the zinc finger portion of GKLF within the amino-terminal 1 1 ⁄2 zinc finger region, which alone is sufficient to confer nuclear localization.
The finding that the zinc fingers of GKLF contain an NLS is both surprising and interesting, since no putative NLS (core or bipartite) is found within the finger region. Nonetheless, our results are consistent with the conclusion from a previous study that a "global" structure of zinc fingers, rather than specific sequences, serves as an NLS for zif268/Egr-1 (10). A notable difference between GKLF and zif268/Egr-1 is that the latter requires the participation of all three zinc fingers for efficient nuclear translocation while GKLF requires only the first 1 1 ⁄2 fingers. It appears that while both these proteins belong to the Krü ppel family of transcription factors due to conservation of their zinc finger sequences, they appear to have diverged sufficiently that their signals for nuclear localization are structurally different.
A comparison of GKLF's aa sequence with those stored in the GenBank™ data base revealed several transcription factors with highly homologous sequences to the zinc finger region of GKLF. These proteins include LKLF, EKLF, and BTEB2 (5). In fact, before the publication of our study on the identification of GKLF, Lingrel and colleagues proposed that LKLF, EKLF, and BTEB2 belong to the same multigene family (7). Indeed, the percent amino acid identity between the zinc finger regions of GKLF and LKLF, EKLF, and BTEB2, is 91, 85, and 82%, respectively. If instead, the 20 amino acids within the 5Ј basic region of these proteins are compared, GKLF and LKLF are 90% identical while GKLF and EKLF are 65% identical (Fig. 3). More importantly, the 5Ј basic region of GKLF contains an identical core NLS to that of LKLF (PKRGRR), which is nearly identical to that of EKLF (SKRGRR). Since the 5Ј basic region of GKLF was shown to function as a potent NLS, and since GKLF is highly similar to both LKLF and EKLF in the corresponding region, we would predict that the 5Ј basic region of LKLF and EKLF would also function as a strong NLS.
The one exception to the hypothesis proposed by Lingrel and colleagues (7) seems to be BTEB2. Despite an overall similarity in the aa sequence between the zinc fingers of GKLF and BTEB2 (82%), the sequences in the 5Ј basic region of the two proteins are very different, sharing only 15% identity (Fig. 3). In fact, when other Krü ppel proteins with conserved zinc finger sequences are compared, the 5Ј basic region of BTEB2 is more related to a different group of proteins, which include BKLF, CPBP, and SP1 (Fig. 3). The 5Ј basic region of two of these proteins, BTEB2 and SP1, do not even contain a core NLS. The aa sequences of the 5Ј basic region of another group of zinc finger proteins, including early growth response ␣, transforming factor ␤-inducible early gene, and GC box-binding protein, are even more divergent from those of the GKLF family of proteins (Fig. 3). In fact, no sequences that even resemble an NLS can be identified in this group. Taken together, our study suggests that the Krü ppel family of transcription factors can be divided into subfamilies based on homology of the 5Ј basic region, a region clearly shown to be important for the nuclear localization of GKLF. Our study also demonstrates that GKLF, LKLF, and EKLF are indeed closely related members of the same subfamily whereas BTEB2 belongs to a different subfamily.
Our study indicates that the 5Ј basic region of zif268/Egr-1 contains an NLS, which functionally does not appear to be as strong as that of GKLF. This result is consistent with those of two previous studies. In one study (4), the 5Ј basic region of zif268/Egr-1 was fused to ␤-galactosidase, and in another study (10), the region was retained in constructs in which all three zinc fingers of zif268/Egr-1 were deleted. In each case, an incomplete nuclear localization was observed. An inspection of the aa sequence in the 5Ј basic region of zif268/Egr-1 and other related proteins such as Egr-2 and Egr-3 ( Fig. 3) showed that they do not contain a core NLS but exhibit characteristics of a bipartite NLS. It is possible that secondary or tertiary structure may contribute to the function of this region as an NLS. More likely, however, is that this region contributes to the overall nuclear localization of zif268/Egr-1 when it is associated with the protein's zinc fingers as suggested previously (4,10).
Another interesting finding derived from the analysis of the zif268/Egr-1's NLS is the effect of the carboxyl half of its third zinc finger in translocating GFP to the nucleus (construct E (Fig. 2)). Previous studies indicate that a mutation of the first arginine residue in this region in the context of the whole protein destroyed nuclear localization despite the fact that the zinc finger structure was maintained (10). These results sug-gest that this peptide sequence should be a potent NLS. Indeed, a core NLS sequence is present in this region, although it appears in an infrequently observed pattern where an arginine residue is separated from a lysine residue by two nonbasic aa residues (RKRHTK). In the study by Boulikas (3), NLSs with this type of configuration account for only 5% of 271 core NLSs examined. When determined empirically, this particular motif was shown to be relatively inefficient in directing a fused albumin protein into the nucleus (21). This result is consistent with our finding that this region by itself confers fairly poor nuclear localization (construct E (Fig. 2)).
In conclusion, we have shown that GKLF contains two potent and independent nuclear localization signals, the sequences of which are highly conserved in two other Krü ppel-like factors, LKLF and EKLF. In addition, by sequence and/or structural analysis of the 5Ј basic regions of other Krü ppel proteins, three additional subfamilies are identified, each predicted to utilize this region to a somewhat different extent for nuclear localization. These differences allow the separation of the various Krü ppel proteins into distinct subfamilies and may reflect differences in the mechanisms regulating nuclear import among the subfamilies.