Messenger RNA recognition by fragments of ribosomal protein S4.

Ribosomal protein S4 from Escherichia coli binds a large domain of 16 S ribosomal RNA and also a pseudoknot structure in the alpha operon mRNA, where it represses its own synthesis. No similarity between the two RNA binding sites has been detected. To find out whether separate protein regions are responsible for rRNA and mRNA recognition, proteins with N-terminal or C-terminal deletions have been overexpressed and purified. Protein-mRNA interactions were detected by (i) a nitrocellulose filter binding assay, (ii) inhibition of primer extension by reverse transcriptase, and (iii) a gel shift assay. Circular dichroism spectra were taken to determine whether the proteins adopted stable secondary structures. From these studies it is concluded that amino acids 48-104 make specific contacts with the mRNA, although residues 105-177 (out of 205) are required to observe the same toeprint pattern as full-length protein and may stabilize a specific portion of the mRNA structure. These results parallel ribosomal RNA binding properties of similar fragments (Conrad, R. C., and Craven, G. R. (1987) Nucleic Acids Res. 15, 10331-10343, and references therein). It appears that the same protein domain is responsible for both mRNA and rRNA binding activities.

Functional studies of ribosomes have tended to focus on the roles of the ribosomal RNAs in recent years, as a number of studies have uncovered specific contributions of different rRNA domains to ribosome activities (1). As more ribosomal protein sequences have become available, it is becoming clear that a number of these proteins are highly conserved among all organisms and must also have specific and necessary roles in ribosome function. An intriguing set of ribosomal proteins are those that bind directly and independently to the ribosomal RNAs and also autogeneously regulate ribosomal protein expression. In many cases the regulation is due to the protein recognition of the mRNA translational initiation region (2), although protein binding to a pre-mRNA splice site has also been observed (3). These instances of a single protein carrying out two different RNA-related functions provide interesting systems for studying how RNA recognition has evolved and is related to specific protein functions (4).
In several instances there is convincing similarity between the secondary structures of the mRNA and rRNA targets of an autoregulatory ribosomal protein (4 -6). It is reasonable to conclude that both mRNA and rRNA bind in the same active site of any one of these proteins. In other cases there is no obvious similarity between the two RNA substrates. For instance, the mRNA target site for Escherichia coli S4 protein is a complex pseudoknot of about 110 nucleotides within the ␣ operon mRNA (7). Nearly the entire 5Ј domain of the 16 S rRNA, a fragment of 460 nucleotides, is needed to form the ribosomal binding site for the protein (8), although a smaller region is protected from cleavage by bound protein (9). There is no primary or secondary structural similarity between the two target sites, for which there are two possible explanations. S4 may be recognizing a three-dimensional rRNA structure that is common to the two RNAs but not obvious from comparisons at the secondary structure level, or separate rRNA and mRNA binding domains may have evolved in S4. In vivo experiments showing that mutant S4 proteins with C-terminal deletions assemble into ribosomes but are defective in mRNA regulation (10,11) have been interpreted in favor of the latter explanation (12).
In this paper we build on the previous work of Craven and colleagues (12)(13)(14)(15), who cleaved S4 by various methods and studied the ability of the protein fragments to bind 16 S rRNA and promote ribosome assembly in vitro. We have prepared a number of similarly truncated proteins by overexpression of the S4 gene rather than cleavage methods, and show that Nand C-terminal regions of S4 that are not necessary for rRNA binding are also not required for S4 recognition of the ␣ mRNA pseudoknot. The results show that no more than ϳ130 of the 205 amino acids are needed to fold a stable RNA binding domain, and suggest that two regions within this domain may recognize different parts of the RNA.

MATERIALS AND METHODS
Cloning of S4 Gene Fragments-DNA fragments containing portions of the E. coli S4 gene flanked by the required restriction sites were obtained by PCR 1 from pNO2801 (16) or from an M13 phage into which the S4 gene had been inserted and two stop codons introduced after Arg-104. The DNA primer sequences used for the 5Ј ends of the expressed gene were A AAG CAT ATG GCA AGA TAT TTG GGT (start at S4 N terminus) and G CGT AAA CAT ATG CTG TCT GAC TAT GGT (start at Leu-48). (NdeI sites are underlined.) Primer sequences used to introduce stop codons were TT TGG ATC CAA GCT TTA CTT GGA GT (wild type stop after codon Lys-206), ACG TGG ATC CTT ACT TGC CAG CAT CAA CTT C (stop after Lys-177), and G ACC GGA TCC TTA AAT TGC TTT ATG GCT AAC (stop after Ile-123). (BamHI sites are underlined.) PCR was carried out for 30 cycles using Vent DNA polymerase (New England Biolabs) and standard reaction conditions. PCR reaction products were cleaved with NdeI and BamHI, and ligated into pET11a (17) cut with the same restriction enzymes. Since NdeI cleaves poorly near DNA termini, a "self-ligation" reaction was performed on some of the PCR products before cleavage with this enzyme (18). Ligation reactions were used to transform E. coli HB101. After selection on media containing 100 g/ml ampicillin, candidates were screened for inserts by restriction digestion and appropriate plasmids used to transform E. coli BL21(DE3) (17). These candidates were screened for overproduction of protein in the presence of isopropyl-1-thio-␤-D-galactopy-* This work was supported by grant GM29048 from the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
‡ ranoside, and the inserts of appropriate plasmids were sequenced by the dideoxy method either manually with 32 P-labeled DNA primers or on an automated instrument at the Johns Hopkins Core Genetics facility.
Purification of Proteins-To prepare protein, cell growth and protein extraction procedures described elsewhere were followed with minor modifications (19). A single colony of BL21(DE3) transformed with appropriate pET11a derivative was grown in broth medium with 50 g/ml carbenicillin. Cells were induced with isopropyl-1-thio-␤-D-galactopyranoside after reaching 0.8 OD at 600 nm and allowed to grow for another 3 h (37°C), yielding 4 -5 g of cells/liter. The cells from 1 liter of culture were washed two times with 20 ml of 25% sucrose, 50 mM Tris-HCl, 1 mM EDTA, 6 mM 2-mercaptoethanol, 0.1 mM phenylmethylsulfonyl fluoride, and 0.1 mM benzamidine, resuspended with 20 ml of the same buffer and treated with 4 mg of lysozyme/g (wet weight) of cells on ice for 1 h. This was followed by two cycles of freezing and thawing to improve cell lysis.
The wild type S4 protein was extracted from the crude lysate by raising the salt concentration to 0.7 M NaCl, followed by centrifugation at 10,000 rpm for 30 min. The supernatant was then diluted with an equal volume of 6 M urea buffer (6 M urea, 20 mM KH 2 PO 4 , 0.5 mM DTT, pH 5.6) and then dialyzed against the same 6 M urea buffer. The protein fragments were not extracted into the supernatant by the high salt buffer, and had to be solubilized from the pellet of cell debris. 30 ml of 6 M urea buffer was added to the pelleted debris, the mixture stirred in the cold for 1 h, and centrifuged 40 min at 10,000 rpm.
Proteins in 6 M urea buffer were purified by high performance liquid chromatography using a Bio-Rad TSK SP-5-PW cation exchange column (75 ϫ 7.5 mm) with a 3-h gradient from 0 -40% 1 M KCl in 6 M urea buffer at a flow rate of 0.8 ml/min. Full-length S4 reproducibly yielded nearly 40 mg of pure protein from 1 liter of cell culture. After purification, all the proteins were dialyzed into TK buffer (30 mM Tris-HCl, 350 mM KCl, 0.5 mM DTT, pH 7.6), using dialysis membrane with a molecular mass cut-off of 3500 daltons (SpectraPor 3), and stored at Ϫ70°C. Protein purity was checked by electrophoresis of samples in polyacrylamide gels containing SDS or urea/acetate at pH 4.5; all preparations were better than 95% pure. Amino acid compositions were obtained for all the proteins and agreed with the predicted compositions. N-formyl methionine is missing from all the analyses, and we presume it has been proteolytically removed as it is in wild type S4. Protein concentrations were determined by absorbance at 280 nm using extinction coefficients calculated according to the method of Gill and von Hippel (20). The protein fragments and their extinction coefficients are listed in Table I. Note that previous work on S4-mRNA binding (7) used a larger extinction coefficient reported by Rhode et al. (21); binding constants cited from this paper have been corrected to account for this.
Extension Inhibition Assays-Primer extension by reverse transcriptase was used to detect S4 bound to the ␣ mRNA, approximately as described by Spedding and Draper (22). The ␣ mRNA leader was synthesized by transcription with T7 RNA polymerase (purified in this laboratory) from pT7/T3␣ cut with XmnI, giving a 260-nucleotide transcript (6). Similar transcripts were prepared for mutant RNAs. The RNA was purified from an 8% denaturing acrylamide gel before use. The DNA primer was 21 nucleotides and 5Ј end-labeled with 32 P using polynucleotide kinase. Primer and RNA were annealed in a total volume of 40 l by incubating 13.6 pmol of RNA with 48 pmol of primer in 50 mM Tris-HCl, pH 8.0, 75 mM KCl, 10 mM DTT at 65°C for 3 min. The sample was cooled to 50°C, and 8 l of the same buffer containing 90 mM MgCl 2 was added so that the final Mg 2ϩ ion concentration would be 15 mM. The reaction was incubated for an additional 20 min at 50°C to allow the RNA to renature, and then shifted to room temperature (5 min) before placing on ice.
S4 protein and its fragments were warmed to 37°C for 30 min in their own storage buffer supplemented with 7 mM 2-mercaptoethanol, cooled at room temperature for 5 min, and then placed on ice before use. Both protein and subunits were kept on ice until use. The final buffer composition for the toeprinting reaction was 50 mM Tris-HCl, pH 8.0, 75 mM KCl, 10 mM DTT, 3 mM MgCl 2 , and 0.7 mM 2-mercaptoethanol. The mRNA concentration was 57 nM, and dNTPs were 0.1 mM each. 200 units of SuperScript reverse transcriptase (Life Technologies, Inc.) were used for each 10-l assay. The order of addition of components to the assay buffer was dNTPs, protein, and RNA with annealed primer. Following a 10-min incubation of all the components at 0°C, reverse transcriptase was added and extension allowed to proceed at the indicated temperature for 15 min; temperatures from 28 to 37°C were used. Reactions were then quenched with 10 l of formamide dye mix (99% (v/v) deionized formamide, 10 mM NaOH, 1 mM Na 2 EDTA, and 0.05% each of bromphenol blue and xylene cyanol), placed in a boiling water bath for 3 min, and then placed on ice before loading half of the reaction onto an 8% denaturing polyacrylamide sequencing gel (40 ϫ 20 ϫ 0.035 cm). Dideoxy sequencing reactions, carried out with avian myeloblastosis virus reverse transcriptase (Life Sciences, Inc) and the same DNA primer as used for toeprint reactions, were included on each gel for reference. Gels were exposed to preflashed x-ray film at Ϫ70°C, and quantitation of bands was done on a Molecular Dynamics scanning densitometer.
Circular Dichroism Measurements-Circular dichroism (CD) spectra were recorded on a Jasco J-710 spectropolarimeter. A rectangular cuvette with a 2-mm path length was used in all measurements. Spectra were taken in TK buffer to which 2-mercaptoethanol had been added to 7 mM; protein concentrations ranged from 0.29 to 0.51 mg/ml. Series of spectra at increasing temperature were obtained, allowing 10 min for the solution to equilibrate at each new temperature. Six scans were accumulated and averaged for each spectrum, recording the spectra at 0.2-nm intervals between 190 and 260 nm at a scan rate of 10 nm/min and a bandwidth of 0.5 nm. The spectra were corrected by subtracting the background ellipticity due to buffer alone. Because of the high concentration of KCl, the spectra are unreliable below 205 nm.
Filter Binding Assays-Protein-RNA binding affinities were estimated by a nitrocellulose filter binding assay, using essentially previously described procedures to synthesize 35 S-labeled RNA and carry out the binding (23). RNA was renatured at 42°C for 20 min, rather than the 65°C temperature used previously, and there was no urea in the final reaction mix, since the proteins were stored in renatured form. The final buffer concentrations were 30 mM Tris-HCl, 350 mM KCl, 7 mM 2-mercaptoethanol, and 8 or 2 mM MgSO 4 at pH 7.6 for the rRNA and mRNA titrations, respectively. Labeled RNAs were transcribed by T7 RNA polymerase from the following DNA templates: the ␣ mRNA leader (139 nucleotides) from pT7/T3␣ cut with HindIII (7); the 5Ј 559 nucleotides of 16 S rRNA from pTRV cut with EcoRV (8); and a 505nucleotide fragment of 23 S rRNA from pTRS2 cut with SalI (24).
Gel Mobility Shift Assays-The RNA used for gel mobility shift assays for protein binding was a fragment from nucleotides 16 to 127 of the ␣ mRNA leader; its preparation and purification has been described previously (25). Preparation of protein-RNA complexes at 0°C was carried out in the same way as for the filter binding assay. The total volume was 10 l, with an RNA concentration of 5 M, and the final buffer was 30 mM Tris-HCl, pH 7.6, 350 mM KCl, 10 mM MgCl 2 , and 3.5 mM 2-mercaptoethanol. After a 10-min incubation of the complex, 1.5 l of dye buffer (0.05% bromphenol blue, 50% glycerol, 350 mM KCl, 10 mM MgSO 4 , 30 mM Tris-HCl, pH 7.6) was added.
The reactions were loaded onto a polyacrylamide gel (7.7% acrylamide, 0.2% bis-acrylamide, 10 cm long and 0.8 mm thick) with 30 mM Tris-HCl, 350 mM KOAc, and 2 mM MgSO 4 , pH 7.6, as the running buffer. The gel was cooled to 4°C before loading the samples; electrophoresis was carried out at 75 V for 3 h at 4°C. The running buffer was recirculated periodically to prevent formation of a pH gradient. The RNA and complexes were stained with either 0.0001% ethidium bromide or 2% methylene blue.

RESULTS
Overexpression of S4 and S4 Fragments-A number of ribosomal proteins have been cloned and expressed in E. coli using pET vectors that have extremely low levels of transcription in the absence of inducer (19,26). The intact E. coli S4 protein was expressed by this method, and could be purified in higher yield (ϳ40 mg of protein/liter of cell culture; see "Materials and Methods") and with greater ease than possible by standard methods that rely on extraction of protein from purified ribosomes or ribosome subunits (23,27). Six different S4 fragments were also prepared using the same plasmid expression system; the end points are listed in Table I. End points were selected to correspond to fragments previously prepared in the Craven laboratory and tested for ribosomal RNA binding (see "Discussion"). mRNA Binding of Ribosomal Protein S4 Fragments S4 exhibits a high level of nonspecific binding to tRNA and 23 S rRNA fragments in the filter binding assay. Under the salt conditions used here, the apparent nonspecific binding affinities are on the order of 1 M Ϫ1 , and the maximum retention extrapolates to nearly 1.0 (8,28). The same behavior is seen in the binding of overexpressed S4 with a 505-nucleotide fragment of 23 S rRNA (K Ϸ 3 M Ϫ1 and maximum retention Ϸ 1.0, data not shown). The six S4 fragments were also used in the same filter assay; representative titrations of RNA fragments with S4(2-104) are shown in Fig. 1B. The binding of the mRNA, 16 S rRNA, and 23 S rRNA fragments are indistinguishable, and all three extrapolate to a maximum retention of ϳ1.0. All the protein fragments bound all the RNA fragments with apparent affinities ranging between 1 and 4 M Ϫ1 . The curves varied considerably between experiments, and it was not possible to reliably determine whether binding to the 16 S rRNA and ␣ mRNA fragments was significantly larger than nonspecific binding for any of the protein fragments.
The high level of nonspecific binding seen for S4 and its fragments seems to be at variance with results from Craven's laboratory, in which binding of similar proteins to 23 S rRNA could not be detected (Ref. 12; see also "Discussion"). The filter "pull-through" assay used by that group measures the loss of labeled S4 from nitrocellulose filters upon titration with 16 S rRNA, and therefore depends on the inability of S4-rRNA complexes to bind nitrocellulose. This behavior is observed only for 16 S rRNA fragments with 3Ј termini extending beyond approximately nucleotide 900; presumably the larger rRNA "surrounds" the protein and prevents it from contacting the filter (8). Nonspecifically bound S4 might not be protected by the RNA in the same way, rendering the pull-through assay less sensitive to nonspecific binding than the filter retention assay shown in Fig. 1. Since the filter retention assay cannot distinguish weak but specific binding of the S4 fragments to rRNA from nonspecific binding, we have looked for other assays to determine whether the fragments retain specific interactions with the ␣ mRNA.
Protein-mRNA Interactions Detected by "Toeprint" Assay-The toeprint assay was developed by Hartz et al. (29,30) to study ribosome binding to mRNAs. It involves reverse transcription from a DNA primer hybridized downstream of the ribosome initiation site; the ribosome-mRNA complex with initiator tRNA is very stable and quantitatively stops the reverse transcriptase (RVT) about 15 nucleotides into the coding region. Under appropriate conditions, 30 S subunits bound to mRNA in the absence of initiator tRNA can also be observed in the assay, although it is a much less stable complex (30). In studies of the ␣ mRNA, we found that either S4 alone or 30 S subunits alone produced a similar set of RVT pauses near the 3Ј edge of the pseudoknot structure (22,31). The S4 toeprint was difficult to observe because RVT was inhibited by moderate concentrations of the protein. Higher concentrations of RVT overcome this inhibition, and we have obtained reproducible toeprint patterns at protein concentrations up to ϳ0.6 M, as shown in Fig. 2A. At higher S4 concentrations the amount of transcript decreases dramatically, but 0.6 M protein should be sufficient to drive Ͼ80% of the RNA into complexes. The most intense pausing is seen at nucleotides A 119 -U 120 , but additional stops are observed at U 124 -G 125 , C 112 , and C 101 -G 102 ; these nucleotides are indicated on a diagram of the mRNA secondary structure in Fig. 3. The pattern is unchanged at 28 or 37°C (not shown). Although reverse transcription to C 112 and C 101 requires disruption of one or two pseudoknot helices, previous affinity measurements with 3Ј deletions of the mRNA have suggested that fragments terminating at C 101 retain some binding specificity (28).
Since (i) S4 and 30 S subunits are able to induce similar RVT pauses and (ii) the most intense pauses occur near the 3Ј end of the pseudoknot structure, we think it likely that the RVT is sensing stabilization of the pseudoknot by S4, rather than S4 protein itself. Several weak pauses toward the 5Ј end of the mRNA are usually seen as well (Fig. 2). Since RVT transcription should completely unfold the pseudoknot by the time it reaches these pause sites, they are presumably induced by nonspecific S4 binding. To test this, two mRNA mutations with reduced S4 affinity were used in the toeprint assay, CKT12 (G 98 3 C) and CKT4 (C 100 C 101 3 GG). Both have binding constants on the order of 1 M Ϫ1 in the filter binding assay (7). Neither of these induced pauses between C 101 and G 125 , arguing that the the wild type RNA pattern of toeprints in this region is due to a specific S4-mRNA complex ( Fig. 2B and data not shown). Weak pauses toward the 5Ј end of the mutant mRNAs are similar to those seen in the wild type mRNA and are presumably associated with nonspecific binding.
Filter binding assays were carried out with the same RNA under an identical protocol of incubations as used for the toeprint assay. The binding constant measured was 7.2 Ϯ 0.7 M Ϫ1 , slightly lower than measured under the conditions of Fig. 1A. Determination of an apparent binding constant from the toeprint assay is somewhat uncertain, since nonspecific inhibition of the transcriptase at higher S4 concentrations complicates quantitation of the paused transcripts and precludes an accurate determination of the efficiency with which bound S4 induces RVT pausing. From densitometry of the Fig. 2A gel, we estimate that the A 119 -U 120 pause reaches 50% of its maximum possible value at ϳ0.3 M S4 concentration; thus, K Ϸ 3 M Ϫ1 . It is likely that RVT biases the measurement by displacing bound S4, so 3 M Ϫ1 should be considered a lower limit. The affinity of the complex inducing the toeprint pattern is therefore in the range expected for specific binding.
Toeprints of S4 Fragments-The mRNA toeprint assay was repeated with each of the S4 fragments under the same conditions used for the intact S4; these results are shown in Fig. 4. S4(2-177) and S4(48 -177) gave essentially the same toeprint pattern as S4(2-206), although the relative proportions of paused transcripts were much less. For instance, only 0.016 or 0.025 of the transcripts terminated at A 119 -U 120 with 0.6 M S4(48 -177) or S4(2-177), respectively. The intensity of the pauses at C 112 and C 101 -G 102 increases slightly, and a new pause at A 113 appears, probably because RVT reads through U 120 a much greater fraction of the time with these protein fragments. The overall diminution in toeprint intensity is consistent with the lower binding constant observed with the S4 fragments in the filter binding assay; faster exchange of S4 a The numbering is that of the 206 amino acid rspD gene (46). S4 was originally reported to have 203 amino acids (47), because (i) N-terminal processing removes the initial methionine, and (ii) Leu-90 and Ser-144 were omitted, as later shown by sequencing of the gene. Thus the numbering reported here may differ by two or three amino acids from other papers.
b Extinction coefficient at 280 nm calculated by the method of Gill and von Hippel (20), except for S4 , for which the value measured by Dodd and Hill (48) is given.

mRNA Binding of Ribosomal Protein S4 Fragments
presumably allows RVT to transcribe through the pause sites at greater frequency. Despite the weaker binding, the similar toeprint patterns suggest that deletions of amino acids 2-47 and 178 -206 do not alter binding specificity.
Two of the remaining fragments, S4(2-123) and S4(2-104), gave similar patterns as intact S4 at the U 124 -G 125 , C 112 -A 113 , and C 101 -G 102 pause sites, but did not show any effect on the cluster of pauses around U 120 (Fig. 4, C and D). The simplest interpretation of this result is that the protein forms two sets of contacts with the RNA, one of which is localized to residues 124 -177 and specifically stabilizes the U 120 region.
All of the S4 fragments were also assayed with CKT4 RNA.
No toeprint pattern in the C 101 -G 125 region was observed for any of them, arguing that the pause sites seen in Fig. 4 reflect specific S4-mRNA complexes (data not shown). Gel Shift Assays-After renaturation of ␣ mRNA fragments containing the pseudoknot structure, the RNA can be resolved into two conformers with different mobilities in non-denaturing gels run at low temperatures; the proportion of the two conformers depends on the concentration of Mg 2ϩ ion present during the renaturation. 2 Only the faster mobility conformer is bound by S4 and appears at much slower mobility in a "gel shift" experiment. An example of the gel shift, using unlabeled RNA and excess S4 , is shown in Fig. 5. The complex is marginally stable to electrophoresis, as it can be observed only under stoichiometric binding conditions. Attempts to observe the gel shift were made with all six S4 fragments. Only the two longest fragments that gave the complete toeprint pattern, S4(2-177) and S4 (48 -177), showed any effect. For both fragments, the faster mobility conformer disappeared from the gel and a very faint band could be observed at the position of the S4-RNA complex (Fig. 5). Failure to observe a quantitatively shifted band, as well as a smear of RNA extending from the shifted band position to the fast conformer position, suggest that the S4 fragment-RNA complexes dissociate more readily during electrophoresis than does the complex with intact S4. Nevertheless, the fact that only one of the two RNA conformers was affected by the two protein fragments is an additional argument that their interaction with the mRNA is specific, although weaker than for intact S4.
Protein Folding Assessed by CD-The N-and C-terminal deletions that weaken S4 binding to the ␣ mRNA may do so simply because the protein structure is destabilized, and not because protein-RNA contacts have been removed. To deter-2 T. C. Gluick and D. E. Draper, submitted for publication. mine whether the deletions have caused major disruptions in the S4 structure, CD spectra of S4 and each fragment were taken over a range of temperatures. To make sure that the proteins were renatured after purification from denaturing solvent, spectra were taken at 8°C before and after heating to 37°C for 30 min. The warming did not alter the spectrum of any of the fragments (data not shown).
The CD spectrum of S4 extracted from ribosomes has been reported as part of an extensive physical study of the protein (32). Although obtained under different buffer conditions, it is essentially the same as what we observe (Fig. 6A). The approximate T m of S4 is 41-45°C, from observations of the irreversible unfolding of the protein in scanning calorimetry experiments (32,33). This is consistent with the large decrease in negative ellipticity that we observe between 37 and 55°C (Fig.  6C). Fig. 6 (A and B) also displays the CD spectra of the S4 fragments at 8°C. As successive deletions are made to the C terminus in proteins with an intact N terminus, there are incremental decreases in the intensity of the CD signal (Fig.  6A), as if secondary structures can be sequentially removed from the C terminus without major disruption of the remaining protein. Deletion of N-terminal amino acids has a more un-usual set of effects on the protein structure, as seen in comparisons of panels A and B in Fig. 6. S4(2-177) and S4 (48 -177) have virtually identical spectra, suggesting that the N terminus has little or no secondary structure. (The ellipticity of S4(48 -177) is a maximum of 6.8% more negative than that of S4(2-177) at 218 nm, and less than 1% different at the spectrum minimum of 210 nm.) However, S4(48 -123) has substantially less secondary structure than S4(2-123). In the same way, S4(48 -104) is essentially unstructured, while S4(2-104) has significant secondary structure. An interpretation of these results is that the central part of the protein (48 -104) is stabilized by either the N-terminal 47 amino acids or a region near the C terminus (124 -177). This interpretation is consistent with the observations that fragments producing a specific toeprint have in common only amino acids 48 -104, although S4(48 -104) itself does not show any specific RNA binding.

Comparison of 16 S rRNA and ␣ mRNA Binding by S4
Fragments-N-and C-terminal deletions of S4, similar to the ones described in this work, were prepared by Craven and colleagues and tested for binding to 16 S ribosomal RNA and, in some cases, for ribosome assembly. Mild trypsin digestion of an S4 -16 S rRNA complex yielded the 48 -206 fragment, which competed with intact S4 for 16 S rRNA binding and promoted ribosome assembly (13). However, several proteins were missing from the assembled 30 S subunit, possibly because the RI intermediate failed to undergo the RI 3 RI* activation step necessary for complete assembly (34). Deletions of the C terminus have been selected in several laboratories as pseudorevertants from S12 mutants that are streptomycin-dependent (35)(36)(37). The mutant proteins generally are shorter than wild type by 20 -30 amino acids, bind 16 S rRNA less strongly than wild type S4 (38,39), and tend to be temperature sensitive for ribosome assembly (10). One mutant terminated between Leu-171 and Lys-177, and was estimated to bind 16 S rRNA with about the same affinity as wild type protein, ϳ8 ϫ 10 6 M Ϫ1 (12). The above studies show that S4 specificity for 16 S rRNA resides in residues 48 -177. We find that the same region is responsible for S4-␣ mRNA binding specificity: S4(2-177) and S4(48 -177) both give the same toeprint pattern as intact S4, and specifically interact with the pseudoknot form having faster mobility in gel electrophoresis. The C-terminal deletion decreases the S4-mRNA binding affinity, suggesting that this region either stabilizes the rest of the protein or makes nonspecific contacts with the mRNA. This is consistent with qualitative observations made by Daya-Grosjean et al. (39) and Green and Kurland (38) of truncated S4 proteins binding to 16 S rRNA, although Changchien et al. (12) obtained a binding constant closer to that of intact S4 with the C-terminal deletion they examined. Our own filter binding measurements, made under similar conditions as used by Changchien et al. (12) but with the 5Ј domain of 16 S rRNA instead of the intact rRNA, show that S4(2-177) binds rRNA about 6-fold more weakly than the intact protein.
Streptomycin-independent revertants of S4 do not regulate ␣ mRNA translation in vivo (10,11), which led to the suggestion that C-terminal sequences deleted in these mutants are required for translational repression activity but not ribosome assembly (12). Our finding that S4(2-177) binds ␣ mRNA specifically suggests that defective regulation by these mutants is not due to lack of ␣ mRNA recognition. Two factors may be responsible: the weaker S4-mRNA binding affinity will require a higher pool size of free S4 to achieve regulation, and the mutants are turned over much more rapidly than wild type S4 and do not accumulate as free protein (40). Thus there is no reason to postulate a specific role for the S4 C terminus in ␣ mRNA binding, or separate mRNA and rRNA binding domains.
S4 fragments with C-terminal deletions produced by Me 2 SO-HBr reaction with Trp-170, hydroxylamine cleavage at Asn-123/Gly-124, or cyanogen bromide reaction at Met-105 have been described (14,15,41). All of these interact preferentially with 16 S rRNA over 23 S rRNA in an assay in which the binding of labeled protein to nitrocellulose filters is prevented by formation of the protein-RNA complex (42). The ability of the fragments cleaved at Trp-170 and Met-105 to participate in ribosome assembly was investigated; both fragments promote assembly of 30 S subunits with a full complement of proteins, but sedimentation of the subunits shows either a smaller S value (Trp-170, Ref. 14) or a much broader peak indicating heterogeneous conformations (Met-105, Ref. 41). From these experiments, Conrad and Craven (41) concluded that the sequence 48 -104 contained sufficient information for RNA recognition, but that N-and C-terminal sequences are essential for proper assembly of the 30 S subunit. Because others had found UV-induced cross-links of the C-terminal half of S4 to rRNA (43) and observed rRNA-induced protection of Lys-121 and Lys-148 from reductive methylation (44), the possibility was raised that a region C-terminal to Arg-104 makes a second set of RNA contacts (41).
These results on the 16 S rRNA binding capacity of S4 fragments are again consistent with the behavior we see for similar fragments binding ␣ mRNA. S4(2-104) and S4(2-123) both give a toeprint pattern missing one set of bands found with S4(2-206), but otherwise identical. Neither of these fragments shows a gel shift with the mRNA; probably the binding affinity is even weaker than with S4(2-177) and S4 (48 -177). The most straightforward interpretation of these results is that a set of RNA-protein contacts has been deleted in the fragments, although indirect effects of the 124 -177 region on the conformation of the protein-mRNA complex are also a possibility. In either case, it appears that a region in the C-terminal half of the protein is needed to form a functional S4 complex with both ␣ mRNA and 16 S rRNA.
Although the results obtained with S4(2-104) and S4 (48 -206) suggest that RNA binding specificity should be retained by S4 (48 -104), this fragment does not interact in any specific way with the mRNA. The somewhat larger fragment S4(48 -123) may bind specifically but very weakly. The CD spectra suggest a reason for the lack of binding in these fragments. S4 (48 -104) has little, if any, secondary structure, and the little secondary structure present in S4(48 -123) denatures above 25°C. The structure of the 48 -104 region may need either N-or C-terminal sequences to fold stably.
Conservation of S4 Domains-S4 homologs from a number of organisms have now been sequenced. In Fig. 7 we show the E. coli S4 sequence, and mark those positions at which 10/11 bacterial and chloroplast sequences (including E. coli) are identical, or at which 4/5 eukaryotic sequences are identical to E. coli. The conservation of the middle region from ϳ50 to 140 is quite striking. The eukaryotic S4 homologs show homology with each other along their entire lengths and are about the same size as the bacterial homologs, but the N-and C-terminal regions cannot be aligned with the bacterial sequences. Also shown in Fig. 7 is a prediction of the protein secondary structure using the PHD program and the set of 11 aligned bacterial and chloroplast sequences; the program has been about 70% accurate overall (45).
Taking into consideration the RNA binding properties of the fragments, the sequence conservation of the protein, and the predicted secondary structure, we suggest that S4 can be divided into four regions. The N-terminal 46 amino acids of the protein are not predicted to have secondary structure, consistent with the removal of these amino acids by trypsin digestion of an S4-rRNA complex (13) and the nearly identical CD spectra of S4(2-177) and S4(48 -177); these residues are neither conserved nor essential for mRNA or rRNA binding. The region from 48 -104 is probably responsible for most of the RNA contacts, is predicted to have extensive ␣-helical structure, and is also well-conserved. (This region is unstructured by itself, however, and is therefore not a protein domain in the sense of an independently folding structure.) A third region extends from 105 to somewhere between 137 and 145; it is also well conserved and predicted to have ␣-helix and ␤-sheet structure. This is potentially the region responsible for the altered toeprint seen with fragments terminating at Ile-123 and Arg-104. The fourth region of S4 extends from ϳ145 to the C terminus; it is neither conserved nor predicted to have much secondary structure. At least the sequence from 178 to 206 is not essential for mRNA or rRNA binding, although it appears to increase the binding affinity without altering specific interactions.
Finally, we note that S4 (48 -177), which contains the RNA recognition domain, is small enough that its structure could, in principle, be determined by NMR, although the marginal stability of the fragment at 30 -35°C makes such experiments difficult. The conservation of this region suggests that a homologous fragment with higher stability could be isolated from thermophilic organisms.