The nuclear RPL4 gene encodes a chloroplast protein that co-purifies with the T7-like transcription complex as well as plastid ribosomes.

We have cloned and sequenced the cDNA and the gene coding for plastid ribosomal protein L4 (RPL4) from two higher plant species, spinach and Arabidopsis thaliana. Ribosomal protein L4 is one of the ribosomal proteins for which extraribosomal functions in transcriptional regulation has been demonstrated in prokaryotes. Sequence comparison of the two plant cDNAs and genes shows that the RPL4 gene has acquired a remarkable 3' extension during evolutionary transfer to the nuclear genome. This extension harbors an intron and codes for a glutamic and aspartic acid-rich amino acid sequence that resembles highly acidic C-terminal tails of some transcription factors. Co-purification of ribosomal protein L4 with plastid RNA polymerase and transcription factor CDF2 using different purification protocols as well as the surprising amino acid sequence of the L4 protein make it a likely candidate to play a role in plastid transcriptional regulation.

Higher plant plastid ribosomes are closely related to those found in eubacteria, reflecting the endosymbiotic origin of chloroplasts. They contain about 54 -75 ribosomal proteins, depending on the plant species (1)(2)(3). The complete sequencing of several higher plant plastid genomes has shown that about one third of the plastid ribosomal (r) 1 proteins are encoded by the plastid genome itself (4 -7). The remaining two thirds are probably encoded by the nuclear genome, and it is thought that the genes have been transferred from the plastid genome to the nucleus during evolution. From these nuclear-encoded plastid r-proteins, only few have been characterized by their cDNAs and/or by their genes (see Harris et al. (1) and references therein).
During the last 10 years ribosomal proteins became of special interest because it was shown in a number of cases that they have extraribosomal functions apart from the ribosome and protein biosynthesis (for review, see Wool (8)). These extraribosomal functions concern basic cellular processes like replication, transcription, RNA processing, translation, and DNA repair. Specific functions in transcription have been reported for r-proteins S10 (9 -12), L4 (13)(14)(15) and S14 (16). In all three cases, the regulation by r-proteins concerns the expression of ribosomal components (rRNA, operon S10, and mRNA for rprotein S14).
The plastid genome is transcribed by two different RNA polymerases. One is nuclear-encoded, T7-like, and especially active during early phases of plastid development. This polymerase is more or less specified for transcription of plastid housekeeping genes, encluding subunits of the second RNA polymerase, which is plastid-encoded, prokaryotic-like, and transcribes preferentially photosynthesis-related genes during later phases of plastid development (17)(18)(19)(20)(21). The transcriptional activity of the prokaryotic-type RNA polymerase is regulated by nuclear-encoded, prokaryotic-type, and sigma-like factors (22,23).
Our group has been working for several years on the expression of rrn transcription in spinach plastids. Transcription of the rrn operon is regulated by a transcription factor that we called CDF2 (24). CDF2 is supposed to act in a switching mechanism between the nuclear-encoded T7-like plastid RNA polymerase and the plastid-encoded Escherichia coli-like RNA polymerase (25).
The first evidence that r-proteins could also play a role in regulation of rrn transcription in plastids came from CDF2 cross-linking experiments. Cross-linking of CDF2 to the 16 S RNA promoter is abolished by the presence of antibodies raised against 50 S r-proteins but not in the presence of 30 S rproteins. 2 This result prompted us to analyze whether specific r-protein(s) co-purify with RNA polymerase and (or) CDF2. If so, they should represent candidates for r-proteins implicated in transcriptional regulation. To this aim we purified RNA polymerase together with the transcription factor CDF2 and we followed the co-purification of r-proteins with the transcription complex by antibodies raised against 50 and 30 S ribosomal proteins. We found that two r-proteins co-purify. In the present study we report on the cloning and sequencing of the corresponding cDNA and gene of one of the two r-proteins, the L4 protein, and on the partial characterization of the protein.
The protein is nuclear-encoded and has not yet been analyzed in higher plants, and comparison with its cyanobacterial ancestor reveals remarkable differences. It belongs to the group of r-proteins for which functions in transcriptional regulation has been demonstrated in prokaryotes.

EXPERIMENTAL PROCEDURES
cDNA Library Screening and Sequencing-A spinach cDNA ZAPII library (Stratagene) was screened with two degenerate oligonucleotides (5Ј-CCNTAYAARACNTCNAAYTTYTC-3Ј, 5Ј-GARACNTTYYTNAAY-YTNAARAC-3Ј). Filters were prehybridized at 42°C for 5 h, and hybridization was performed overnight at 38°C (5 ϫ SSC, 1% blocking reagent (Boehringer Mannheim), 0.1% N-laurylsarcosine, and 0.02% SDS). Filters were washed twice with 2 ϫ SSC, 0.1% SDS at room temperature for 5 min and twice at 40°C for 15 min. Positive plaques were purified by two rounds of plating and hybridization. The pBluescript phagemid was recovered by excision according to the manufacturer's protocol, and the cDNA was sequenced using the T7 sequencing kit (Pharmacia Biotech Inc.).
Genomic Library Screening-A spinach genomic library cloned in EMBL3 (kindly provided by B. Pelzer-Reith, Berlin) was screened using the entire RPL4 cDNA as a probe. Filters were prehybridized and hybridized at 68°C using the same protocol as for cDNA screening. The cDNA was labeled with [␣-32 P]dCTP by random priming. Washing of filters was at 68°C for 15 min as for cDNA screening, but in addition at 68°C for 30 min in 0.5 ϫ SSC, 0.1% SDS. Positive plaques were purified by two rounds of plating and hybridization. DNA digests were subcloned into pUC18 for sequencing.
Southern Blot Analysis-30 g of genomic DNA were digested with the indicated restriction enzymes, and the digest was run overnight on a 0.8% agarose gel at 20 V. DNA was transferred to nylon membranes. Prehybridization and hybridization was performed for 5 h and overnight, respectively, at 42°C in 5 ϫ SSC, 2% blocking reagent (Boehringer Mannheim), 0.1% N-laurylsarcosine, 0.02% SDS, and 45% formamide. The RPL4 cDNA probe (530-bp 5ЈcDNA fragment) was labeled by random priming. The membrane was washed at room temperature for 10 min in 2 ϫ SSC, 0.1% SDS, and twice at 65°C for 15 min in 1 ϫ SSC, 0.1% SDS.
Cloning of Poly(A) Containing C-terminal Tails of RPL4 mRNAs-Total RNA was isolated from roots and cotyledons of 1-weekold spinach plantlets as described previously (20). 10 g of RNA were used for RT-PCR reactions that were performed as described by Harrak et al. (27). Reverse transcription was performed with an arbitrary sequence followed by a poly(T) tail (5Ј-CTTCCGATCCCTACGC(T) 18 -3Ј). One tenth of the reactions was PCR-amplified using the following primers: the preceding arbitrary primer with a strongly shortened poly(T) tail (5Ј-CTTCCGATCCCTACGCTTT-3Ј) and a primer corresponding to a region in the 3Ј coding part of the RPL4 mRNA (5Ј-GCTGCAATGCAGAGGTGG-3Ј, thin arrow in Fig. 2). One half of the reaction products were separated on a 1.8% agarose gel, transferred to a nylon membrane and hybridized with the RPL4 cDNA under the same conditions as described above; the other part of the reaction products were used for cloning and sequencing of the corresponding cDNA fragments.
L4 Overproduction and Antibody Preparation-The sequence corresponding to the mature L4 protein was amplified by PCR and the product was cloned into the expression vector pET19b (Novagen). After induction with isopropyl-1-thio-␤-D-galactopyranoside the His-L4 fusion protein was purified on TALON metal affinity resin (CLONTECH). The eluted protein was further purified by SDS-PAGE. The L4 protein was excised from the acrylamide gel and the gel slice was directly used for antibody production (EUROGENTEC).
The preparation of plastid anti-30 S and anti-50 S antibodies is described in Dorne et al. (2).

Co-purification of Two Potential Ribosomal Proteins with
CDF2 and RNA Polymerase-Starting from Percoll-purified spinach chloroplasts (18) the transcription factor CDF2 was purified successively on heparin-Sepharose, phosphocellulose, and 16 S-promoter-agarose columns. After each column all eluted fractions were tested by gel-retardation assays for the presence of CDF2 (not shown; for methods, see Baeza et al. (24) and Iratni et al. (25)). All CDF2-containing fractions were assembled, dialyzed before applying to the subsequent column. Fig. 1 shows the electrophoresis profile of the assembled CDF2containing protein fractions after phosphocellulose (lane 1) and 16 S promoter-agarose chromatography (lanes 2 and 3). Proteins from the first 1 M NaCl elution step of the last column (lane 2) were transferred to Immobilon-P membranes and tested either with preimmune serum (lane 6), with antibodies raised against total 50 and 30 S ribosomal proteins (lane 4) or with antibodies raised against T3 RNA polymerase (lane 5). From this result we can conclude that CDF2 co-purifies with the T7-like plastid RNA polymerase and two r-protein-like polypeptides on three different columns.
Cloning, Sequencing, and Characterization of the cDNA and Gene for One of the Two Supposed r-proteins-The two polypeptides that cross-reacted with the r-protein antibodies were cut out and their N-terminal amino acid sequences were determined. The study of one of these proteins is presented here. We used the N-terminal amino acid sequence of the smaller one of the two proteins (ELIPLPILNFSGEKVAETFLNLKTA) to design two degenerate oligonucleotides and to isolate the corresponding cDNA. One hybridizing clone was obtained after screening of a leaf ZAPII cDNA library. The sequence is shown in Fig. 2A (EMBL accession no. X93160). It corresponds to a polypeptide of 32.34-kDa calculated molecular mass including a 5.3-kDa transit peptide. The transit peptide starts with MA and contains 50% serine and threonine residues. These properties are characteristic of plastid transit peptides. The N-terminal sequence of the mature protein that was determined by amino acid sequencing is underlined. The protein is 33.7% identical in a 178 amino acid overlap to the ribosomal protein L4 from E. coli. Sequence comparison of the mature plastid L4-like protein with the corresponding E. coli protein shows the presence of a C-terminal extension. This extension is extremely negatively charged (62.5% of amino acids are glutamic or aspartic acids, see Fig. 5 for comparison). To ensure that this highly acidic extension does not correspond to a cloning artifact, we performed RT-PCR analysis using spinach cotyledon RNA. The size of the resulting cDNA of the complete RT-PCR reaction (Fig. 2B, lane 2) fits well to the expected size of 358 bp. The band was cloned and sequenced. It corresponds exactly to the part of the sequence encompassing the two primers as shown in Fig. 2A. However, a control PCR reaction using the same primers on a spinach DNA preparation yielded a band of much higher molecular mass (Fig. 2B, lane 3). This DNA band of about 2 kb was also cloned and sequenced, and the DNA was subsequently used as hybridization probe to isolate genomic clones. We found that the RPL4 gene contains only one large intron. This intron comprises 1688 bp and is located in the 3Ј part that encodes the acidic extension. The position of the intron is marked by an open triangle in Fig. 2A. The gene sequence is accessible at the EMBL data bank (accession no. Y14932).
None of the isolated cDNA clones contained a poly(A) se- quence at the 3Ј end. To locate the poly(A) stretch we used the method of Kudla et al. (28). This method is based on RT-PCR using an arbitrarily 5Ј-extended oligo(dT) primer for cDNA synthesis. The subsequent PCR amplification was made using a primer that corresponds to the arbitrary primer and a primer located within the 3Ј-coding region of the RPL4 mRNA (thin arrows in Fig. 2, see also "Experimental Procedures"). The reaction was performed with three different RNA preparations, RNA from roots (Fig. 2C, lane 1, overexposed in 1Ј), cotyledons of dark-grown plantlets (lane 2), and cotyledons of plantlets grown in normal light/dark cycle (lane 3). The four obtained DNA fragments were cloned and sequenced. They correspond to four different polyadenylation sites that are indicated by asterisks in Fig. 2A. The distribution of the four differentially polyadenylated RNAs seems to vary in the three analyzed RNA preparations with a higher proportion of smaller mRNAs in dark-grown cotyledons and in roots (Fig. 2C). Note that detection of root RNA needed 8 times longer exposure.
The RPL4-like Protein Is Present in the Plastid 50 S Riboso-mal Subunit-Results of Fig. 1 show the co-purification of a L4-like protein with plastid RNA polymerase and CDF2, i.e. the protein exists in a soluble, non-ribosome-bound form or should be easily detachable from the ribosomal surface. To confirm that this protein represents indeed a plastid ribosomal protein, we isolated plastid 70 S ribosomes and separated the two subunits by sucrose-gradient centrifugation (2). The separation profile is shown in the top of Fig. 3A. Peak 3 represents the 30 S ribosomal subunit, peak 4 the 50 S subunit. The proteins of regions 1-4 were precipitated, separated by SDS-PAGE, and transferred to nitrocellulose membranes. The filters were treated with antibodies raised against the plastid r-protein S1 (29) and with antibodies raised against the plastid L4-like protein (Fig. 3A, bottom). Both antibodies were prepared in rabbits using fusion proteins that were obtained by overproduction in E. coli and subsequent purification (see "Experimental Procedures"). As expected, the S1 protein is revealed only in the 30 S subunit fraction (Ab S1, lane 3), the L4-like protein is revealed in the 50 S subunit fraction (Ab L4,  Fig. 3B, on the left. The third largest protein corresponds to the r-protein L4 (Fig. 3B, on the right). From these studies we conclude that one of the two anti-r-protein antibody reactive proteins present in the purified transcriptionally active fraction (Fig. 1, lane 2) is the plastid r-protein L4.
Gene Number and Expression Studies-To answer the question whether there exists two genes for the L4 protein, one encoding the acidic extension the other one not, we performed Southern analysis to estimate the gene number coding for the L4 protein. Genomic DNA was digested by five different restriction enzymes, and agarose-gel-separated digestion products were hybridized using radiolabeled RPL4 cDNA (Fig. 4A). With the exception of the SacI cleavage reaction, only one DNA band is labeled in each case. This strongly suggests the existence of a single gene only.
The gene is highly expressed in leaf/cotyledon tissues (Fig.  4B). Northern analysis did not reveal any hybridization signal in root tissues (not shown). However, a RT-PCR-based amplification method demonstrates the presence of mRNA in roots. The mRNA level is slightly higher in cotyledons of dark-grown plantlets (Fig. 4B, cotyledons/D) compared with cotyledons of plantlets grown in light/dark cycle (cotyledons/L). The mRNA level of cotyledons augments further if 1-week-old dark-grown plantlets are exposed to continuous light for 8 h (cotyledons/ LD). The same relation of mRNA in cotyledons of plantlets grown in different light conditions was found by Northern analysis (not shown).
Comparison of the Spinach RPL4 cDNA and Gene to the Corresponding Arabidopsis cDNA and Gene-To know whether the acidic extension of the L4 protein exists also in other plant species, we searched for the cDNA for the plastid r-protein L4 from Arabidopsis thaliana. An EST-tblastn search, using the first 110 N-terminal amino acids of the spinach protein, allowed us to detect the EST clone 116F23T7. The clone was obtained from the Arabidopsis EST Stock Center and completely sequenced. The sequence is accessible in the EMBL data bank under the accession no. Y14565. The sequence comparison of the Arabidopsis protein with L4 r-proteins from spinach, one cyanobacteria, E. coli, and Plasmodium falcipa-rum is shown in Fig. 5. An acidic extension is also present in the Arabidopsis protein, but not in the cyanobacterial or Plasmodium proteins. This indicates that the extension was acquired after integration of the plastid gene into the nuclear genome during evolution. The Arabidopsis and the spinach protein sequences have an amino acid identity of 62.1% in a 282-amino acid overlap and possess highly conserved motifs in their transit peptides (40% amino acid identity). The Synechocystis L4 protein has the highest similarity to the higher plant plastid L4 proteins (52.9%) in accordance with the cyanobacterial origin of higher plant plastids. The amino acid sequence of the L4 protein of Porphyra purpurea is still 44.4% identical, but the similarity decreases to 31.9% for E. coli L4 and to 26.2% for the L4 protein of the plastid-like genome of the malaria parasite P. falciparum.
To analyze whether the intron in the region of the highly acidic extension is conserved between species, we screened a genomic library of Arabidopsis thaliana, ecotype Columbia, that was cloned in GEM11 using the cDNA as a probe. Two positive clones were selected and sequenced. The sequence is , and darkness but exposed for 8 h to continuous light before harvesting (LD) were used for RT-PCR amplification using the primers noted in Fig. 2 by bold long arrows. Amplification products were separated on agarose gels, blotted to nitrocellulose, and hybridized to the labeled RPL4 cDNA fragment encompassing the same primers that were used in the RT-PCR reaction. available at EMBL under the accession no. Y14566. The sequence shows that the Arabidopsis RPL4 gene also contains only one intron. This intron is much shorter (152 bp) than that of the RPL4 gene of spinach (1688 bp) but is located also at the 3Ј end within the region coding for the acidic extension. The two intron positions are marked by filled triangles in Fig. 5. DISCUSSION We have cloned and sequenced the cDNA and gene coding for the plastid ribosomal protein L4 from two higher plant species. Protein L4 is known to regulate the expression of the 11-gene r-protein operon S10 in E. coli (30) by interfering in a mechanism of attenuation (31). L4 is encoded by the third gene of the S10 operon (see Fig. 6). In cyanobacteria the S10 gene is translocated to the str operon (32). In algae (34 -37), higher plants (4 -7), and also in the liverwort Marchantia polymorpha (38), the first three genes of the E. coli S10 operon are absent on the plastid genome. The RPL4 gene is also absent in the cyanelle genome of Cyanophora paradoxa (39) and on the plastid genome of the nonphotosynthetic, parasitic plant Epifagus virginiana (40). It is therefore supposed that these genes have been transferred to the nuclear genome during evolution. This means also that the target of L4-mediated transcriptional regulation, i.e. the S10 leader region, has disappeared, and it raises the question whether any regulatory function of the L4 protein on the gene expression level has been conserved during evolution.
In the present work we report on the co-purification of the plastid r-protein L4 with RNA polymerase and the transcription factor CDF2 in spinach. Our experiments were focused on the purification of the two CDF2 proteins of 30 and 33 kDa that were shown to cross-link to the rrn operon promoter region (24). As preliminary cross-linking experiments in the presence of 50S r-protein antibodies had suggested that r-proteins are engaged in CDF2-DNA complex formation, 2 we used an approach to test all column fractions for the presence of CDF2 by gel retardation assays and simultaneously for the presence of ribosomal proteins by antibody reactions on Western blots. We found that two potential r-proteins co-purified on three different successive columns with CDF2. One of them is the r-protein L4. This is shown by N-terminal amino acid sequencing and isolation and sequencing of the corresponding cDNA and gene.
Sequence analysis shows that the spinach plastid L4 protein belongs to the nuclear-encoded plastid r-proteins that do not have N-terminal extension in addition to that of the transit peptide when compared with the E. coli counterpart (Fig. 5). The maturation site of the protein is exactly determined by the beginning of the N-terminal sequence of the mature protein ( Fig. 2A). We determined also the polyadenylation sites of the L4 mRNA (Fig. 2, A and C). We found four different polyadenylation sites that might be used differentially in root and cotyledon tissues. Whether poly(A) site selection contributes to regulation of gene expression remains an open problem. Up to now, no tissue-specific or development-specific alternative polyadenylation site selection has been described for plants.
Many of the known nuclear-encoded plastid r-protein contain C-terminal extensions compared with their E. coli counterpart. However, the C-terminal extension of the L4 protein is unique in its amino acid composition. The evolutionary transfer of the RPL4 gene from the ancestral cyanobacterial genome to the host nuclear genome was followed with the acquisition of a highly acidic C-terminal protein domain. Such highly acidic C-terminal extensions are found in a number of chromatininteracting proteins and transcription factors like transcription factor Dr1 (41), the ␦ factor of Bacillus subtilis (42), nucleolar transcription factor UBF1 (43), nucleosome assembly protein (44), HMG group proteins (45,46) and the smallest subunit The Higher Plant Plastid-related RPL4 Gene (C31) of yeast RNA polymerase III (47), RNA polymerase II transcription factor PTF␥ (48), transcriptional repressor protein AEBP1 (49), and RNA polymerase II associated factor Paf1P (50). In the case of the HMG-D protein, it was shown recently that the acidic tail plays a role in alteration of the structural selectivity of DNA binding (44). The acidic tail of the C31 subunit of yeast RNA polymerase seems to be engaged in transcription initiation (51).
E. coli r-protein L4 is an RNA-binding protein (52). Although the exact mechanisms of the L4-regulated attenuation control of the S10 operon are not yet completely understood, it can be supposed that the L4 protein should interact with E. coli RNA polymerase and/or the NusA protein. Preliminary assays using renatured spinach plastid L4 protein in a heterologous system with E. coli RNA polymerase suggest that the plastid L4 protein enhances promoter binding of the E. coli enzyme (not shown). Further experiments to confirm this result and to establish a homologous transcription system for the two types of plastid RNA polymerases (E. coli-like and T7-like) (18,25) are in preparation. It is tempting to speculate that the plastid L4 protein has acquired an acidic C-terminal extension to fulfill additional role(s) in transcriptional regulation.