Crystal Structure of the Nuclear Matrix Targeting Signal of the Transcription Factor Acute Myelogenous Leukemia-1/Polyoma Enhancer-binding Protein 2αB/Core Binding Factor α2*

Transcription factors of the acute myelogenous leukemia (AML)/polyoma enhancer-binding protein (PEBP2α)/core-binding factor α (CBFA) class are key transactivators of tissue-specific genes of the hematopoietic and bone lineages. AML-1/PEBP2αB/CBFA2 proteins participating in transcription are associated with the nuclear matrix. This association is solely dependent on a highly conserved C-terminal protein segment, designated the nuclear matrix targeting signal (NMTS). The NMTS of AML-1 is physically distinct from the nuclear localization signal, operates autonomously, and supports transactivation. Our data indicate that the related AML-3 and AML-2 proteins are also targeted to the nuclear matrix in situ by analogous C-terminal domains. Here we report the first crystal structure of an NMTS in an AML-1 segment fused to glutathione S-transferase. The model of the NMTS consists of two loops connected by a flexible U-shaped peptide chain.

The spatial arrangement of the components of gene expression is important to their function. Chromosomal translocations that relocate the subnuclear targeting sequences of transcription factors can potentially contribute to the altered nuclear organization and related changes in gene expression that are characteristic of cancer cells. It is important, therefore, to understand how subnuclear targeting signals localize transcription factors.
The transcriptionally active form of the hematopoietic transcription factor AML-1 contains an unique C-terminal 31amino acid nuclear matrix targeting signal (NMTS). This sequence directs AML-1 to nuclear matrix-associated subnuclear sites that support transcription. Furthermore, the NMTS of AML-1 can direct the heterologous GAL4 to the nuclear matrix (8,9). Therefore, the NMTS represents an autonomous protein module with sufficient structural determinants to mediate subnuclear targeting. Domains analogous to the NMTS are conserved in the ubiquitous AML-2 and bone-related AML-3 regulatory proteins. Although subtle differences exist among the NMTS sequences of AML-1, AML-2, and AML-3, their NMTS segments may have similar targeting functions. The structure of the NMTS can, then, provide essential insight into properties of the AML factors that mediate trafficking to and/or binding to nuclear matrix-associated transcription sites. In this paper, we report the first crystal structure of the NMTS peptide, in this case fused with glutathione S-transferase, at 2.7-Å resolution.

EXPERIMENTAL PROCEDURES
Crystallization and Data Collection-Detailed procedures for protein expression, protein purification, crystallization, and data collection have been reported earlier (10). The space group of the crystal is P4 3 2 1 2, with cell dimensions of a ϭ b ϭ 93.4 Å, c ϭ 57.6 Å. The final R sym of the data set is 11% and the data completeness is 94.9% up to 2.7 Å.
Structure Determination and Refinement-The phase determination of GST-NMTS was carried out by the molecular replacement method using AMoRe (11). The search model consisted of the coordinates of GST fused to a 6-amino acid peptide (Ref. 12, Protein Data Bank entry 1GNE), omitting the fused six residues, the water molecules, and glutathione. The molecular replacement was carried out with data between 10-and 4.0-Å resolution. Cross-rotation calculation produced a peak with a height of 9 (3 above the next highest non-symmetry related solution). The resulting solution from the translation search gave an R-factor of 35.5% and a correlation coefficient of 68.1%. After rigid body refinement of the molecular replacement solution using the resolution range of 8 Å-4 Å (implemented in AMoRe), the R-factor decreased to 33.6%. Simulated annealing was carried out with the program X-PLOR (13), followed by positioning and temperature factor refinement utilizing the restrained conjugate gradient least-square method. Refinement was conducted in the resolution range of 6 Å-2.7 Å. The starting Rfactor was 38% and R free was 38% (10% of the data were set off for free R-calculations). Residues 218 -260 were built into 2F o Ϫ F c , ␣ c electron density map using the program O (14). No solvent molecules were added. Refinement continued until convergence was obtained. The final structure was refined to an R free of 31.0% and an R-factor of 20.9%. Refinement details and statistics are summarized in Table I. The root mean square deviations of bond distances and bond angles from ideal geometry are 0.005 and 1.2 Å, respectively.
In Situ Nuclear Matrix Isolation and Indirect Immunofluorescence Analysis-In situ nuclear matrices were prepared as described (9). Ros 17/2.8 cells on coverslips were washed in phosphate-buffered saline and extracted twice in CSK buffer (15) for 15 min each. DNase digestion was performed twice in digestion buffer (CSK buffer with 50 mM NaCl) containing 100 g/ml DNase I for 30 min, followed by extraction in digestion buffer containing 0.25 M (NH 4 ) 2 SO 4 for 10 min. The coverslips were fixed in 4% formaldehyde in phosphate-buffered saline. The primary antibodies were anti-XPRESS (1:500 dilution of a mouse mAb, purchased from Invitrogen); or rabbit polyclonal antibodies against AML-1, AML-2, and AML-3 (8). The secondary antibody was incubated for 1 h at 37°C and was either a fluorescein isothiocynate-conjugated goat anti-rabbit antibody (1:400, Jackson ImmunoResearch), or a Texas red-conjugated donkey anti-mouse antibody (1:400, Jackson Immu-noResearch). DNA content was evaluated by 4Ј,6-diamidino-2-phenylindole staining (5 mg/ml 4Ј,6-diamidino-2-phenylindole in phosphatebuffered saline containing bovine serum albumin and 0.05% Triton X-100). Cells were mounted in Vectashield H-1000. Microscopic images were obtained by using a CCD camera interfaced with a digital microscope system and displayed by using the Metamorph software.

RESULTS AND DISCUSSION
Overall Structure of the GST-NMTS Fusion Protein-The three-dimensional structure of an AML-1 segment (residues 346 -394 in AML-1), containing the NMTS and fused to glutathione S-transferase (called GST-NMTS), was determined to 2.7-Å resolution by x-ray crystallography. The statistics of the diffraction data and the final model are summarized in Table I. The electron density map around the link and the fused AML-1 portion, including the NMTS, is shown in Fig. 1A. In this paper, this numbering system for GST-NMTS is utilized unless otherwise specified. The GST-NMTS structure is composed of three regions (Fig. 1B): the GST region, the link region including the thrombin cleavage site and the kinase site, and the fused AML-1 fragment. This fragment corresponds to residues 346 -373 in AML-1 (8) which contains the N-terminal part of the NMTS and the turn preceding the NMTS segment. The eight C-terminal amino acids of the NMTS and the adjacent proline-rich segment (PPPYPGEFIVTD) are disordered in the final structure, suggesting that the C-terminal segment of the NMTS represents a flexible component of the overall structure.
The structure of the GST region is similar to other GST structures (12, 16 -20), consisting of four ␤-strands and three ␣-helices in the N-terminal ␣/␤ domain and five ␣-helices in the C-terminal ␣ domain (Fig. 1B). The root mean square deviation between the C␣ atoms of the GST domain in the final model and the C␣ atoms of the Schistosoma japonicum GST structure is 0.9 Å. The link region forms an ␣-helix and a turn connecting the GST domain to the AML-1 segment. The NMTS region protrudes from the GST segment and resides in the space created by ordered GST moieties. As shown in Fig. 2A, the structure of the NMTS shows it can attain a finger-shaped loop region (Loop I), a hinge-shaped glycine-rich turn, and a ␤-strand (Strand II). These three regions are nearly coplanar. The approximate measured angle between the backbone of Loop I and the backbone of Strand II is 72.7°. The region of AML-1 adjacent to the NMTS (residues 233-237, represented in yellow in Fig. 1B)  The Loop I Region of the NMTS Structure-Loop I, which contains several residues highly conserved among members of the AML family, is shaped like a finger. Although the side chain of Phe-240 is disordered, it should still bulge from the tip of the finger according to the position of its C␤ atom. The side chains of its adjacent residues, Thr-241 and Tyr-242, also protrude upwards from the tip ( Fig. 2A). These three residues are highly conserved among AML-1 and AML-3 proteins (Fig. 3) and are, therefore, important in either nuclear matrix binding or in maintaining structural integrity. In the crystal structure, the tip region is followed by an Ser-Pro-Thr-Pro segment, which twists the position of the next two conserved residues, Val-247 and Thr-248, and forms a bump slightly above the plane of the NMTS segment. Ser-249, preserved among all members in the AML/PEBP2␣/CBFA family, positions slightly beneath the plane and orients toward Strand II ( Fig. 2A).
The C␣ atoms of the Loop I region exhibit high similarity to the CDR3 regions of the variable domains of antibodies (14). The two fragments which have the best matches are residues 94 -103 of the Fab segment of immunoglobulin Kol (21) and residues 39 -49 of the variable domain of the Bence-Jones protein REI (22). The fragments in Fab and REI belong to the antigen-binding site. The root mean square differences between the C␣ atoms are 1.6 and 1.8 Å for Fab and REI, respectively. This similarity suggests that the orientation of conserved amino acids in Loop I of the NMTS may have an analogous function in associating with a docking site in the nuclear matrix.
The Glycine-rich Turn of the NMTS Structure-A glycinerich turn (GIGIG) is located at the bottom of Loop I and Strand II. The side chain of Ile-251 extends out from the bottom. The side chain of Ile-253 is disordered, although the position of its C␤ atom indicates that the alkyl side chain points to the inner face of the bottom of Loop I and Strand II. The C␣ atoms of the three glycine residues are all located on the outside surface of the turn (Fig. 2A). A similar glycine-rich motif has been associated with nucleotide-binding sites, such as the Rossmann fold (23), which contains a ␤ strand followed by a glycine-rich loop and an ␣ helix. In protein kinases and in protein phosphatases, the highly conserved P loop (GXGXXG) interacts with the phosphate group of the nucleotide. However, this glycine-rich sequence is not conserved among AML/PEBP2␣/CBFA proteins (Fig. 3). Although the first glycine is preserved, the second glycine exists only in the AML-1 family and is replaced by serine in the AML-3 and AML-2 families. The last glycine is only conserved in the AML-1 and AML-3 proteins. A deletion occurs at this residue in AML-2. We propose that the GIGIG fragment may serve as a pivot point which permits rotation  1. A, the 2F o Ϫ F c electron density map around the C terminus of the GST-NMTS fusion protein, contoured at the 1.0 level, was made using program O (14). The model, shown in white, consists of the link region (residues 218 -232), the turn region (residues 233-237), and the NMTS between Loop I and Strand II to promote nuclear matrix association.
The Strand II Region of NMTS Structure-Strand II is composed of less conserved residues. Only the first three residues are conserved between AML-1 and AML-3: methionine, serine, and alanine. The remaining three residues Met-258, Gly-259, and Ser-260 are absent in the AML-3 family and substituted by serine, valine, and alanine in AML-2. Strand II is a ␤-strand with Ser-256, Met-258, and Ser-260 pointing toward Loop I ( Fig. 2A). However, the side chain of Met-255 extends under the plane formed by Loop I and Strand II. It points away from the Loop I direction. The side chain of Ala-257 also faces away from Loop I.
Residues 261-274, which contain the C terminus of the NMTS region, are disordered in the final structure. As expected, the C␣ atoms of Strand II are similar to those of the ␤-strands in other proteins. The closest fragment consists of residues 32-37 of REI, with an root mean square difference of 0.3 Å. In addition, the amino acids after residue 37 in REI form a loop in the crystal structure. Consistently, segments similar to Strand II in other proteins (e.g. acid proteinase, dihydrofolate reductase, serine protease, and rhodanese) are all followed by surface loops. Furthermore, residues 378 -383 of ␤-glycosidase (RYHLYM) (24), the segment in the RELIBase sequence data base with the highest sequence identity (67%) to the C-terminal region of the NMTS (RYHTYL), forms a loop in the crystal structure. However, none of the entries in the sequence data base show similarity with residues 261 to 274 of NMTS. Hence, residues 261-262 in the NMTS are probably an extension of Strand II, while residues 263-268 adopt a loop conformation (Loop II, Fig. 2B). Therefore, we propose that Strand II and the C terminus of the NMTS comprise a second loop that participates in nuclear matrix association (Fig. 2B).
The Effect of Crystal Packing on the NMTS Structure-We have also considered the effects of crystal packing on the final NMTS structures. There is one hydrogen bond and one van der Waals interaction between residues in the NMTS and the symmetry-related GST (Fig. 4). The main chain carboxyl group of Gly-254 forms a 2.7-Å hydrogen bond with the O␦-1 group of Asp-38 of the symmetry related molecule. In addition, Gly-259 has a 4.0-Å van der Waals' contact with Gln-43 of the neighboring molecule. These interactions may help to stabilize the ␤-strand in the crystal structure. Therefore, the structure of NMTS in solution may not be the same as the peptide structure reported here. However, we propose that the model we have presented, namely two finger-like loops connected by a Ushaped peptide chain, is a biologically meaningful representation of the NMTS peptide when it is a component of the native  FIG. 3. Sequence alignment of the segments containing the NMTS in the AML family, homologous to residues 346 -387 in AML-1/CBFA2. The secondary structure in the NMTS region of GST-NMTS is listed above the sequence alignment. Residues conserved in more than two families are marked in gray (see Fig. 2B). The Loop regions in the NMTS are proposed to interact with the nuclear matrix. The two loops are linked by a highly flexible GIGIG hinge ("turn"), which may function to adjust the relative positions of the loops. The residues at the tops of the loops are highly conserved among AML family members. region (residues 238 -260). B, an overall stereo view of the C␣ backbone of GST-NMTS, from the same angles as in A. The GST region (residues 1-217) and the link region (residues 218 -232) are shown in blue and green, respectively. The fused fragment of AML-1, which includes the turn region (residues 233-237) and the NMTS region (residues 238 -260) are shown in yellow and cyan, respectively. The figure was prepared by SETOR (30). AML protein. We postulate that the two loops of the NMTS interact with a putative protein or nucleic acid acceptor in the nuclear matrix, as supported by mutagenesis results described below. The flexibility of the hinge region may accommodate the recognition.
Existing mutagenesis data are consistent with the proposed model. Deletion of 4 amino acids at the C terminus of the AML-3 NMTS almost completely eliminated nuclear matrix binding. 2 In addition, a 3-amino acid mutation of the strand II region of the NMTS in mouse AML-1B (GMSA, residue 254 -257, to KKSK mutation) did not affect nuclear matrix interaction (25).
The NMTS Sequence in AML-3 Is Necessary for Nuclear Matrix Targeting-The sequence alignment of AML proteins in Fig. 3 suggests that the conserved amino acids of the NMTS function in vivo to direct AML/PEBP2␣/CBFA transcription factors to nuclear matrix-associated subnuclear sites that support transcription. The three-dimensional structure obtained from x-ray diffraction can then serve as a model for all nuclear matrix targeting signals in this family. We therefore examined whether AML-2 and AML-3 are nuclear matrix associated, as we have previously shown for AML-1. ROS 17/2.8 osteosarcoma cells were transfected with constructs expressing epitopetagged AML-1 (Fig. 5A), AML-2 (Fig. 5B), or AML-3 (Fig. 5C). Nuclear matrices were prepared, and the presence of AML factors was assessed by in situ immunofluorescence microscopy. The results shown in Fig. 5 indicate that AML-1, AML-2, and AML-3 each associate with the nuclear matrix.
We then addressed whether the NMTS of AML-3/CBFA1 resides in the C terminus, analogous to the location of the NMTS in AML-1. We generated a mutated AML-3/CBFA1 lacking the NMTS (C-terminal deletion of amino acids 362-513) and examined nuclear matrix targeting of this mutant protein by immunofluorescence microscopy. Fig. 6 illustrates that both the full-length AML-3 (Fig. 6B) and the C-terminal NMTS deletion (Fig. 6E) exhibit a punctate distribution in nuclei of detergent-extracted cells. DNA staining of the same cells is shown in Fig. 6, A and D, respectively. The full-length AML-3 is retained in the nuclear matrix (Fig. 6C), whereas the Cterminal deletion lacking the NMTS (Fig. 6F) is not. Thus, when the putative NMTS of AML-3/PEBP2␣/CBFA1 is deleted,  (8). Cells harvested 24 h post-transfection were prepared for detection of nuclear matrix-associated proteins as described under "Experimental Procedures," using rabbit polyclonal antibodies specific to each protein, and an objective with ϫ 62 magnification. nuclear import is retained but interaction with the nuclear matrix is abrogated. This result, together with our previous studies (26), establishes that the C termini of AML-3 and AML-1 each contain homologous intranuclear targeting signals.
Utilization of GST Fusion Proteins to Obtain the Crystal Structures of Small Peptides-The GST fusion protein expression system has been extensively used to generate large amounts of protein for structure studies. To prevent the interdomain flexibility introduced by the GST segment, GST fusion proteins are normally cleaved with proteases to release the GST moiety prior to crystallization. To date, the only published structure of a GST fusion protein is that of GST fused with a 6-amino acid conserved neutralizing epitope of gp41 from human immunodeficiency virus (12). We have determined the crystal structure of GST fused with a 41-amino acid peptide. Although the peptides attached to the C terminus of the GST protein were of different lengths, the crystal lattices are very similar between the two structures. Both belong to the tetragonal space group of P4 3 2 1 2. The unit cell dimensions differ by less than 1%. The fused peptides at the C terminus of the GST protein protrude from the GST domain and occupy the space formed by the crystal lattice of the GST protein. Therefore, the GST lattices may promote formation of the crystals. Taken together, these results support the concept that GST fusion proteins are useful protein vehicles to solve the crystal structures of small peptides.
Conclusions-Molecular recognition depends on steric or electrostatic complementarity of the combining surfaces. We report here that the structure of the NMTS of AML-1/ PEBP2␣B/CBFA2 consists of two loops connected by a flexible U-shaped structure. The NMTS may interact with the nuclear matrix via the tips of the finger-like loop domains. We conclude that subnuclear targeting of CBFA/AML transcription factors to transcriptionally active sites is a highly selective process involving precise molecular recognition of specific components of nuclear architecture (8,9,(27)(28)(29). The proposed model for the AML NMTS provides a basis for designing mutations that can further define structure-function interrelationships between the NMTS and the nuclear matrix.