A large DNA-binding nuclear protein with RNA recognition motif and serine/arginine-rich domain.

cDNA species encoding a large DNA-binding protein (NP220) of 1978 amino acids was isolated from human cDNA libraries. Human NP220 binds to double-stranded DNA fragments by recognizing clusters of cytidines. Immunofluorescent microscopy with antiserum directed against NP220 revealed a punctate or “speckled” pattern and coiled body-like structures in the nucleoplasm of various human cell lines. These structures diffused in the cytoplasm during mitosis. Western blot analysis showed that NP220 is enriched in the lithium 3,5-diiodosalicylate-insoluble fraction of nuclei. The domain essential for DNA binding is localized in C-terminal half of NP220. Human NP220 shares three types of domains (MH1, MH2, and MH3) with the acidic nuclear protein, matrin 3 (Belgrader, P., Dey, R., and Berezney, R.(1991) J. Biol. Chem. 266, 9893-9899). MH1 is a 48-amino acid sequence near the N terminus of both human NP220 and rat matrin 3. MH2 is a 75-amino acid sequence homologous to the RNA recognition motifs of heterogeneous nuclear RNP I and L. It is repeated three times in NP220 and twice in matrin 3. MH3 is a 60-amino acid sequence at the C terminus of both NP220 and matrin 3. NP220 has an arginine/serine-rich domain commonly found in pre-mRNA splicing factors. Close to the domain essential for DNA binding, there are nine repeats of the sequence LVTVDEVIEEEDL. Thus, NP220 is a novel type of nucleoplasmic protein with multiple domains.

The nucleus is a highly organized structure, permeated by a proteinous "matrix" and including several subcompartments. Lamins are concentrated near the periphery of the nucleoplasm. By coiled-coil self-association (1)(2)(3), lamins A, B, and C can form a meshwork throughout the nucleoplasm. Lamin B1 interacts with chromatin (4) at clusters of adenosine-or thymidine-rich sequences called "matrix-attachment regions" (MARs) 1 (5). The DNA of chromatin is organized into con-strained loops of ϳ60 kilobases (6,7), and MARs are thought to form the basis of such chromosomal loops. ARBP was first purified from chicken oviduct cells (8) as a MARs-binding protein. Thymus-specific SATB1 (9) and HeLa cell SAF-A (10) have been cloned and shown to bind MARs. Matrins D, E, F, G, and 4 are another group of DNA-binding proteins (11)(12)(13). They may be important for organizing chromosomes, localizing genes, and regulating DNA transcription and replication (14 -17).
The heterogeneous nuclear ribonucleoproteins (hnRNPs) condense and package growing RNA transcripts. Because of these functions, hnRNPs are also important components of the interchromatin space. The primary sequences of many hnRNPs including hnRNP B, C, and E from grasshopper (30), Drosophila (31), Xenopus (32), and human (33,34) have been determined, and they are grouped into different categories based on the RRM (35).
We describe here a novel nucleoplasmic protein (NP220) of human cells with an estimated size of 220 kDa. Human NP220 binds cytidine-rich sequences in double-stranded DNA (dsDNA). It is striking that NP220 also has RRMs similar to hnRNPs I and L and a SR domain. Thus, NP220 is a novel type of nucleoplasmic protein having multiple domains suggesting functions related to both RNA and DNA.

EXPERIMENTAL PROCEDURES
Cloning of Human NP220 cDNAs-Clone K1 was selected from a gt11 library of HeLa cells (Clontech) with a dsDNA fragment of human mitochondrial promoter region (nucleotides 322-581 in Ref. 36) essentially as described (37). Clones M5 and N9 were selected from ZAPII libraries of HeLa cells and Namalwa cells, respectively, with clone K1 as the probe. Clone HK1 was selected from the ZAPII library of human heart (Stratagene) with clone N9 as the probe, and clone HK2 was selected with HK1 as the probe. All ZAPII libraries were prepared using random primers. Both strands of inserted sequences were sequenced by the dideoxynucleotide method (38) using a DNA sequencing * This work was supported in part by Grants-in-aid for Scientific Research 06454075 and 07282207 from the Ministry of Education, Science and Culture of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleic acid sequence and deduced amino acid sequence of human NP220 are available from the DDBJ/EMBL/GenBank DNA data bases under accession number D83032.
system model 370A of Applied Biosystems. The 5Ј-terminal sequence was determined by the rapid amplification of cDNA ends (RACE) method (39) using three nested 30-mer primers complementary to the known 5Ј-end sequence of human NP220 and two anchor primers of 5Ј-GGAATTCTCGAGTCGACATCG with and without A(T) 17 Definition of the Domain Essential for DNA Binding-A series of partial human NP220 polypeptides was prepared by subcloning the inserted sequence of cone K1 into pKK223-3, digesting it with PstI or HindIII, and ligating (pK1, pK-H, and pK-P in Fig. 1). A Southwestern blot of the polypeptides with the fragment of mitochondrial promoter region was performed mainly as described (40). Hybridization was carried out for 1 h in TED buffer containing 10 mM Tris-HCl (pH 8.0), 1 mM EDTA, and 1 mM dithiothreitol, and the filter was washed twice for 30 min with TDE buffer containing 50 mM NaCl.
Sequence Selectivity of Human NP220 for DNA Binding-The preferential sequence of human NP220 for DNA binding was studied by the modified selected and amplified binding sequence (SAAB) method (41). For this, the product of pK-1 ( Fig. 1) was separated by SDS-polyacrylamide gel electrophoresis (42), renatured, and transferred to a nitrocellulose filter. The corresponding part of the filter was treated with 5% skim milk suspended in 10 mM Tris-HCl (pH 7.5) for 1 h, washed twice with TED buffer, and incubated for 1 h with ϳ10 ng/ml dsDNA fragments having 20 bp of random sequence (5Ј-TTGCTCACTCGAGA-CACC-(N) 20 -GCACATCTAGACGTTAGC-3Ј; the underlined regions correspond to XhoI and XbaI sites, respectively) in TED buffer containing 10 mg/ml poly(dI⅐dC)/poly(dI⅐dC). After washing the filter twice with TED buffer containing 50 mM NaCl for 30 min, bound DNA was eluted with buffer containing 20 mM Tris-HCl (pH 8.0), 1 mM EDTA, 500 mM NaCl, and 0.1% (w/v) SDS and amplified by polymerase chain reaction. Successive rounds of selection and amplification were repeated six times, and the selected sequences were determined after cloning into XhoI-XbaI sites of pBluescript.
Preparation of Antibody against Human NP220 -The XbaI-HindIII fragment of clone K1 was subcloned into pGEX-3X (pK1-BD in Fig. 1), and the fusion protein was purified (43) and injected into a rabbit. The antibody was affinity-purified from antiserum (44).
Cell Fractionation and Western Blotting-HeLa cells were lysed by gentle addition of a buffer containing 10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 , and 0.5% (v/v) Nonidet P-40 and incubation on ice for 10 min. The nuclei were precipitated by centrifugation of the cell lysate at 1000 ϫ g for 10 min, and the supernatant was analyzed as cytoplasm. Lithium 3,5-diiodosalicylate (LIS) supernatant and insolu-ble ("nuclear matrix") fractions were prepared by the low salt method (45) with a buffer system essentially as described (46) with supplements of 0.1 mM sodium tetrathionate, 1.0 mM phenylmethylsulfonyl fluoride, and 15 kallikrein-inactivating unit/ml aprotinin. The fractions were separated by SDS electrophoresis (42) using 4% (for NP220) or 4 -20% (for lamin B1) (w/v) acrylamide gel. As a control of proteolytic degradation of NP220 during the fractionation, packed HeLa cells were directly lysed in the SDS sample buffer and separated in parallel. Western blotting was performed as described (47) using anti-human NP220 antibody or monoclonal antibody against lamin B1 (Matritech Inc.).

RESULTS AND DISCUSSION
Cloning of Human NP220 cDNA and Its Sequence-In an attempt to isolate cDNA clone(s) encoding mitochondrial promoter binding protein(s), we screened a HeLa cell cDNA library constructed in gt11 with a dsDNA fragment encompassing two human mitochondrial promoters (HSP and LSP; see Ref. 48) and obtained clone K1 (Fig. 1). Since Northern hybridization of poly(A) ϩ RNA from HeLa cells, human epidermoid carcinoma A431 cells, and human hepatoma Hep G2 cells showed a band of ϳ6 kilobases, we continued the screening of HeLa and Namalwa cell cDNA libraries in ZAPII with clone K1 and obtained clones M5 and N9, respectively (Fig. 1). By further screening of a human heart cDNA library, clones HK1 and KH2 were isolated (Fig. 1). This series of clones covered 6459 bp of sequence. We further used RACE to obtain an additional 112 bp of sequence at the 5Ј terminus. The sequence of altogether 6571 bp (Fig. 2) has one open reading frame encoding a large polypeptide followed by a polyadenylation  Fig. 2. C. expression plasmids obtained by subcloning the original cDNA clones in B. pK1, pK1-H, and pK1-P are in pKK223-3 plasmid and used for the study of DNA binding activity (Fig. 5). pK1-BD is in pGEX-3X plasmid and used for the antibody preparation.
signal of AATAAA at 6473 nt and 73 bp of poly(A) tail starting at 6499 nt (Fig. 2). At the 5Ј terminus, there are several ATG codons preceded by sequences similar to the ribosome recognition sequence (49). We assume that the ATG codon at 316 nt is the initiation codon since the open reading frames starting at upstream ATGs are very short. The open reading frame encodes a sequence of 1978 amino acids with a calculated molecular mass of 220,617. Since many lines of evidence showed this is a nuclear protein localized in the interchromatin space of various human cell lines, we refer to it as NP220. Human NP220 has unusually high contents of glutamic acid (10.4%), lysine (9.3%), and serine (11.0%). The contents of acidic (Glu and Asp) and basic (Lys, Arg, and His) amino acids are both ϳ16% and they tend to form clusters. NP220 is a hydrophilic protein without any large hydrophobic domain.
Domain Structure of Human NP220 -Human NP220 has three types of internally repeated amino acid sequences (Figs. 1 and 2). The first type is the sequence rich in arginine and serine (RS domain) at residues 471-574, where 58 out of 104 amino acids were either arginine or serine (Fig. 3A). RS domains are found in pre-mRNA splicing regulators of Drosophila (21)(22)(23), mammalian U1 snRNP 70-kDa protein (50,51), and many non-snRNP splicing factors (24 -27). RS proteins detected with monoclonal antibody against a common epitope range in size from 20 to 75 kDa (28). Additional RS proteins detected with a new monoclonal antibody in reconstituted spliceosomes in vitro include one that is larger than 200 kDa (29).
The second type of repeat is a ϳ76-amino acid sequence repeated three times at residues 677-753, 906 -981, and 1010 -1084 (Fig. 4C). Since a homologous sequence is found in rat matrin 3 (52), we refer to it as a MH2 domain (see below). Together with the sequence in rat matrin 3, the MH2 repeats constitute RRM similar to the RRMs of hnRNPs I and L (53). hnRNP I, also known as polypyrimidine tract-binding protein (54,55), binds to hnRNA through this RRM.
The third type of repeat is at the C terminus of human NP220 (Figs. 1 and 2), where characteristic sequences repeat nine times (Fig. 3B). Since 6 out of 13 amino acids in the consensus sequence are acidic, we refer to it as the acidic repeat. Since the acidic repeats contain many amino acids with an oxygen atom capable of interacting with metals as in EF hand (56), we tested the calcium binding ability of this domain by expressing the inserted sequence of clone M5 (Fig. 1) in Escherichia coli, separating the product by SDS electrophoresis, blotting to nitrocellulose, and probing with 45 CaCl 2 (57). Although the product gave a radioactive signal (results not shown), the binding could be demonstrated only at calcium concentrations above 0.1 mM.
As summarized in Fig. 4, human NP220 shares three types of domains (MH1, MH2, and MH3 domains) with matrin 3, which had been cloned by Belgrader et al. (52) from a rat cDNA library. In the MH1 domain, more than 70% of the amino acids are identical or similar. In the MH2 domains, ϳ50% of the amino acids are similar, and both NP220 and matrin 3 retain the core sequences of RRM found in hnRNPs I and L, suggesting that they form a subfamily within the large superfamily of RNA-binding proteins (35). In MH3, 42% of the amino acids are identical or similar.
DNA Binding Activity of Human NP220 -The original cDNA clone (K1) of human NP220 was obtained due to the DNA binding activity of the product. In order to define the domain essential for this DNA binding, we prepared a series of pKK223-3 plasmids with clone K1 sequences progressively truncated at the 3Ј-end (pK1, pK1-H, and pK1-P in Fig. 1). Southwestern blotting with a 32 P-labeled fragment of the mitochondrial promoter region as probe showed binding for pK1 and pK1-H but not for pK1-P (Fig. 5). This shows that residues 1353-1477 ( Figs. 1 and 2) are essential for DNA binding. Except for the relatively high content of serine (17.6%), this domain has no characteristic motif found in other DNA-binding proteins.
A modified SAAB method employing the pK1 product showed that NP220 preferentially binds to cytidine clusters in either strand of dsDNA. Thus, after six rounds of successive selection and amplification of oligonucleotides having 20 bp of random sequence, fragments having the consensus sequence of CCCCC(G/C) were selected (Fig. 6A). Since both mitochondrial promoters (HSP and LSP) have such cytidine clusters (Fig. 6B), it is reasonable that clone K1 was isolated as a binding protein of the mitochondrial promoter region. It is worthwhile to note that this preferential DNA target of NP220 is distinct from the A-and T-rich sequences in MARs. This shows that the DNA binding specificity of NP220 is different from that of ARBP (8), SATB1 (9), and SAF-A (10).
Intranuclear Localization of Human NP220 -We prepared a polyclonal antibody directed against human NP220 by subcloning the XbaI-HindIII fragment (4034 -4746 nt) into pGEX-3X (pK1-BD in Fig. 1) and immunizing a rabbit with the product. Fig. 7 shows a Western blot with this antibody of subcellular fractions prepared from a HeLa cell homogenate. A signal was seen exclusively in the nuclear fraction (Fig. 7A), as for lamin B1 as a endogenous marker (Fig. 7B). Most of the signal remained associated with the LIS-insoluble so-called nuclear matrix fraction (45) with a minor signal in the LIS-soluble fraction, as for lamin B1. The staining pattern of subcellular fractions suggested some degradation of NP220. Comparable degradation was observed in whole cell lysates prepared by immediate solubilization of living cells with SDS sample buffer. FIG. 5. Definition of the domain in human NP220 essential for DNA binding. A series of fragments of NP220 was expressed in E. coli by subcloning the inserted sequence of K1 into pKK223-3, digesting it with PstI or HindIII and ligating to yield pK1, pK-H, and pK-P in Fig.  1. Extracts of E. coli expressing pK1 (lanes a), pK-H (lanes b), and pK-P (lanes c) were separated by SDS electrophoresis, and the transblotted filters were protein stained or hybridized to the fragment of mitochondrial promoter region. Arrowheads indicate the migration of the specific recombinant products.
FIG. 6. dsDNA fragments selected by pK1 product. A, synthesized dsDNA fragments having 20 bp of random sequences between 18 bp each of two cassette sequences for amplification and cloning (5Ј-TTGCTCACTCGAGACACC-(N) 20 -GCACATCTAGACGTTAGC-3Ј) were selected by binding to the K1 product and then amplified by polymerase chain reaction. After six rounds of successive selection and amplification, the fragments were cloned into pBluescript. Sequences of either strand of the inserted fragments in randomly selected clones are arranged to give maximal matching. Nucleotides in random and cassette regions are written with uppercase and lowercase letters, respectively. The consensus sequence is presented at the bottom. B, the sequences in human mitochondrial promoters (HSP and LSP) similar to the consensus sequence in A are indicated by shading.

FIG. 7. Western blot analysis of subcellular fractions from HeLa cells.
A homogenate of HeLa cells was separated into nuclear and cytoplasmic fraction, and nuclei were further fractionated into LIS-soluble (supernatant) and insoluble (nuclear matrix) fractions. The whole cell fraction was prepared by immediate lysis of living cells with SDS sample buffer. After SDS electrophoresis, proteins on the transblotted filters were detected by protein staining or with the antibody against human NP220 in A or with monoclonal antibody against lamin B1 in B. NP220 migrated in SDS electrophoresis as a 250-kDa polypeptide. This anomalous migration was probably due to the high content of charged amino acids and their clustering.
Indirect immunofluorescence microscopy of interphase HeLa cells with the anti-NP220 antibody showed a diffuse nucleoplasmic signal excluding nucleoli and concentrated in a punctate or "speckled" pattern (Fig. 8A). Similar staining patterns have been observed with antibodies against snRNP proteins, antibodies against 2,2,7-trimethylguanosine (m3G) cap structure, and antisense probes targeted to spliceosomal snRNAs (58 -60). The punctate or "speckled" snRNP distribution results from the association of snRNPs with perichromatin fibrils, interchromatin granules, and coiled bodies (19,20). Many hnRNPs and the spliceosome-associated SR proteins also showed a "speckled" pattern (29,61). This suggests that NP220 may be important for packaging or processing the nascent transcripts. It is striking that P220 is also enriched in two or three coiled body-like structures in each cell (Fig. 8A). Comparable staining of NP220 was observed in other human cell lines such as Hep G2, A431, and non-transformed fibroblast SFYT cells (results not shown). Fig. 8, B and C, depicts the behavior of NP220 during mitosis. Diffused cytoplasmic signal is seen with exclusion from the condensed chromatin. Probably due to the reorganization of coiled bodies during mitosis, the bright stain observed in interphase cells (Fig. 7A) disappeared after the onset of mitosis. Such behavior of NP220 during mitosis was strikingly similar to that of spliceosomal snRNPs (62).
Conclusion-The domain structure of NP220 is intriguing. This protein has one RS domain as for many non-snRNPs, RRMs similar to hnRNPs I and L, and gives a speckled distribution in interchromatin space. These strongly suggest that NP220 is associated with packaging, transferring, or processing transcripts. Since there are at least 20 different proteins in hnRNP complexes (35) and the family of spliceosome-associated SR proteins is rapidly growing (29), NP220 may eventually be considered as one of the hnRNPs or non-snRNP proteins involved in RNA processing. The DNA binding ability of NP220 makes one wonder whether NP220 is really related to RNP proteins. Considering other proteins that bind both DNA and RNA such as TFIIIA (64) and the bicoid of Drosophila (65, 66), NP220 may have an unexpected function. The role of acidic repeats next to the DNA-binding domain requires further studies. In addition, the MH1 and MH3 domains include motifs not found in any class of nuclear proteins. Thus, human NP220 is a novel nuclear protein with multiple domains.