Identical RNA-Protein Interactions in Vivo and in Vitro and a Scheme of Folding the Newly Synthesized Proteins by Ribosomes*

Background: Ribosomal PTC acts as a protein folding modulator in vivo and in vitro. Results: A fixed set of nucleotides in the PTC interacts to fold polypeptides in vivo and in vitro. Conclusion: Folding all proteins through interaction with the same set of nucleotides in PTC implies they have intrinsic homology. Significance: Hundreds of proteins showed an identical cumulative hydrophobicity plot for amino acids. A distinct three-dimensional shape of rRNA inside the ribosome is required for the peptidyl transfer activity of its peptidyltransferase center (PTC). In contrast, even the in vitro transcribed PTC RNA interacts with unfolded protein(s) at about five sites to let them attain their native states. We found that the same set of conserved nucleotides in the PTC interact identically with nascent and chemically unfolded proteins in vivo and in vitro, respectively. The time course of this interaction, difficult to follow in vivo, was observed in vitro. It suggested nucleation of folding of cytosolic globular proteins vectorially from hydrophilic N to hydrophobic C termini, consistent with our discovery of a regular arrangement of cumulative hydrophobic indices of the peptide segments of cytosolic proteins from N to C termini. Based on this observation, we propose a model here for the nucleation of folding of the nascent protein chain by the PTC.

A distinct three-dimensional shape of rRNA inside the ribosome is required for the peptidyl transfer activity of its peptidyltransferase center (PTC). In contrast, even the in vitro transcribed PTC RNA interacts with unfolded protein(s) at about five sites to let them attain their native states. We found that the same set of conserved nucleotides in the PTC interact identically with nascent and chemically unfolded proteins in vivo and in vitro, respectively. The time course of this interaction, difficult to follow in vivo, was observed in vitro. It suggested nucleation of folding of cytosolic globular proteins vectorially from hydrophilic N to hydrophobic C termini, consistent with our discovery of a regular arrangement of cumulative hydrophobic indices of the peptide segments of cytosolic proteins from N to C termini. Based on this observation, we propose a model here for the nucleation of folding of the nascent protein chain by the PTC.
Polypeptides are synthesized in the peptidyltransferase center (PTC) 5 of the large ribosomal subunit to form functional proteins with distinct three-dimensional structure. To understand the chemistry of protein synthesis, use of PTC RNA in vitro was found to be inadequate (1), although it binds to the tRNAs in different functional sites (1). Crystallography and cryomicroscopy of the ribosome in complex with different factors (2)(3)(4) has provided a wealth of information on protein synthesis. Some cryo-EM studies have been attempted even to understand how nascent polypeptide can form secondary or tertiary structures inside the ribosome (5).
In contrast to protein synthesis, in vitro transcribed PTC RNA independently or as a part of the 70 S and 50 S ribosomes and the 23 S rRNA can catalyze the folding of proteins (6 -20). The unfolded forms of a number of unrelated proteins interact with specific sites of the PTC RNA to fold gradually (18 -20). This in vitro activity of PTC prompted us to check whether it has any physiological relevance. Earlier we demonstrated the inhibition of activity gain of proteins in vivo and in vitro by the PTC-binding antibiotics (14). To understand the mechanism of ribosome-mediated protein folding in vivo, we have isolated the ribosome-bound nascent proteins (18) from living bacterial cells. We used quick chilling followed by UV cross-linking to block the ongoing in vivo processes in the log phase bacterial cells and then isolated a C-terminally tagged protein-bound population of ribosomes as the ribosome nascent chain complex (18). A large population of full-length nascent protein was found to be associated with the 50 S ribosome, and a small fraction was associated with the 70 S population (18). Therefore, the release of nascent protein from the ribosome into the cytosol is a slower process (21) than its synthesis. This appears to be consistent with the slow activity gain of nascent protein observed by us (6 -9, 14) and others (22,23). The cohort of chaperones like DnaK, DnaJ, and trigger factor (23) with their exposed hydrophobic patches also provides a unique environment in the cytosol and influences the rate of 50 S subunitmediated protein folding to some extent (17). Using Western blotting, toe printing, and MALDI-TOF MS/MS analysis, we showed that the nascent protein in vivo and the unfolded protein in vitro interact with the same set of nucleotides in the PTC. The time course of this nucleotide-amino acid interaction cannot be followed in vivo, but we could follow it in vitro. It suggests that the nucleation of folding takes place from the N to C terminus of the linear polypeptide.

EXPERIMENTAL PROCEDURES
Chemicals and Strains-The Escherichia coli strain containing a plasmid with mutated (2251G3 A) E. coli rrnB operon was kindly provided by Prof. A. E. Dahlberg, Brown University. Radioisotopes were from the Board of Radiation Isotope (Government of India). Restriction enzyme BsrI was purchased from Fermentas. The ThermoScript RT kit was purchased from Invitrogen. The DNA sequencing kit was purchased from USB. All other reagents were purchased from Sigma-Aldrich.
Isolation of Nascent Protein-bound Ribosome That Was Synthesizing the Protein in Log Phase Bacterial Cells-The procedure to isolate nascent protein-bound ribosomes from growing bacterial cells has been described (18). The full-length nascent protein-bound ribosome population consisted of 50 S subunit as the major part and a small amount of 70 S ribosome. As described in the text, C-terminally His-tagged nascent protein HspH was isolated from cells producing both the wild type and the G2251A mutant ribosomes as follows. The gene for the HspH protein from Bradyrhizobium japonicum was cloned in pET24b (Kan ϩ ) vector. The cI857 gene was cloned in a plasmid conferring neomycin resistance, and the rrnB operon with a 2251G3 A mutation under the control of the P L promoter was cloned in an ampicillin-resistant plasmid. The colonies of transformants resistant to kanamycin, neomycin, and ampicillin were tested for the presence of the three plasmids. One such colony was grown in 100 ml of LB (1% tryptone, 1% NaCl, and 0.5% yeast extract) at 37°C. At an A 600 of about 0.4, isopropyl 1-thio-␤-D-galactopyranoside was added to the culture to a final concentration of 0.4 mM to induce the synthesis of HspH protein, and the rrnB operon was activated by raising the temperature to 42°C to inactivate cI protein. After 1 h of induction at 42°C, protein synthesis in these cells was abruptly stopped by quickly chilling in ice and addition of frozen medium. Cells were immediately UV-irradiated, harvested, and subjected to a French press, and then crude ribosomal extract was obtained by low and high speed centrifugations. One half of this crude extract was subjected to affinity chromatography through a nickel-nitrilotriacetic acid column (Qiagen) in which the full-length HspH protein is expected to bind by virtue of the hexahistidine residues at the C terminus. After applying the extract, the column was thoroughly washed with 10 -60 mM imidazole. The flow-through and wash were collected. The bound protein was eluted with 250 -350 mM imidazole. Both the eluted fractions and the flow-through with wash were centrifuged at 230,000 ϫ g for 2 h at 4°C to have ribosome and its subunits in the pellet, free or cross-linked to HspH. They were dispersed in suitable buffers and run separately in a 5-20% sucrose density gradient with a 30% sucrose cushion at the bottom. The crude ribosome extract and purified E. coli ribosomes (as control) were also subjected to the same density gradient centrifugation. Fractions were collected, and A 260 was measured. From the 23 S rRNA (isolated from the ribosome population that was cross-linked to the nascent protein), a small portion (G2156 to G2454) was amplified by RT-PCR. BsrI restriction digestion was done on the PCR product from the total as well as the nickel column-eluted ribosomes.
Isolation of Ribosomal RNA-Nascent Protein Complex-To separate ribosomal proteins from the ribosomal RNA, the ribosomal particles (both UV-cross-linked and non-cross-linked) were treated with 3 M urea in 50 mM Tris, pH 7.5, 10 mM magnesium chloride, and 100 mM sodium chloride in ice for 3 h followed by gel filtration in the same buffer using Bio Gel P-100 (Bio-Rad) column, and the rRNA was obtained in the excluded volume. RT-PCR was done on the purified rRNA using the following primers specific for the PTC region of the rRNA: 5ЈN (5Ј-GAGAAAGAGAAGCTTGTACCCGCGGCAAGA-3Ј) and BG32 (5Ј-CCGAATTCGGATCCGCGCCCACGGCAGAT-ACTG-3Ј). The plasmid (with the PTC RNA cloned in it) was also used to amplify the region using the above two primers, and the product served as the length standard. The identity of the lengths of the two PCR products confirmed that this stretch of RNA was free from ribosomal proteins (as seen from the amplified DNA from the RNA extracted from UV-cross-linked nontranslating 70 S ribosome), which would otherwise block the PCR.
Western Blotting to Identify the Nascent Protein Bound to the Ribosomal RNA-The rRNA-HspH complex after treatment with RNase A was separated by 10% SDS-PAGE along with the purified native HspH protein as a control and then transferred onto a Hybond P membrane (Amersham Biosciences) with the aid of a Mini Trans-Blot system (Bio-Rad). The membranes were incubated with anti-penta-His antibody (Qiagen), and the immunoblot was detected with Western blotting Luminol reagent (Santa Cruz Biotechnology).
Identification of the rRNA Nucleotides That Interact with the Nascent and Chemically Unfolded Proteins-The toe printings (primer extensions) were done on the rRNA-nascent protein or rRNA-refolding protein complexes according to the procedure reported earlier (19) (see Figs. 3 and Fig. 4). Primer BG32 (5Ј-ACCCCGGAATTCGCGCCCACGGCAGATAGG-3Ј) was annealed with protein-cross-linked PTC RNA. Annealed primers were labeled with [␣-32 P]dCTP following the 3-deoxynucleoside triphosphate method at 55°C using ThermoScript reverse transcriptase (Invitrogen). Labeled primers were extended at 58°C after the addition of all four dNTPs in excess by the same enzyme for about 45 min. The products were precipitated, washed with 70% ethanol, and analyzed on a 6.5% polyacrylamide gel in 8 M urea next to a sequencing ladder of domain V rDNA. The sequencing was obtained using the same primer by Thermo Sequenase DNA polymerase (Thermo Sequenase TM Cycle Sequencing kit, USB).
In the time course studies (see Fig. 4), the band intensity of reverse transcriptase stops for each of the five RNA-unfolded protein interacting sites (marked in Fig. 4, E, F, and G) was expressed as percentage of the total intensity of all the bands in the same lane using Quantity One software. The percent band intensities at all time points for the same block were plotted against time as shown in Fig. 4, B (ovalbumin), C (bovine carbonic anhydrase II (BCA)), and D (lysozyme). The cross-linking was not saturated as evidenced from the presence of top bands in each lane in the gel arising from full-length reverse transcription. For each protein, the results of three experiments were plotted with different colors in the figures.
Identification of the Amino Acids of Nascent and Chemically Unfolded Proteins That Bind to the Specific PTC RNA Nucleotides-MALDI-TOF MS and MS/MS studies were done with the nascent and chemically unfolded protein complexes with the PTC RNA as described earlier from our laboratory (19).
Filter Binding Study-Denatured BCA was allowed to refold in the buffer (50 mM Tris-HCl, pH 7.5, 10 mM Mg(OAc) 2 , and 100 mM NaCl) containing [␣-32 P]UTP-labeled PTC RNA at room temperature. After the addition of denatured protein into the reaction mixture containing the radiolabeled PTC RNA, aliquots of the mixture were quickly filtered through the 0.45-m nitrocellulose filter at different time intervals as mentioned in Fig. 4A. The radioactive count retained on the filter was measured. The 0-s count shows the control (the RNA only).
In Vitro UV Cross-linking of PTC RNA-refolding Protein Complexes-For UV cross-linking the PTC RNA-"refolding protein" complexes, a protocol reported previously (19,20) was followed with minor modifications. Transcribed PTC RNA (concentration, 250 nM) in refolding buffer was kept on a glass plate in ice (reaction temperature, 0°C) in the UV chamber (GS Gene Linker, Bio-Rad) at a distance of 6 cm from the UV source. Denatured protein(s) (250 nM each) was added to it and crosslinked by UV irradiation at 254 nm.
In time course studies, cross-linking was started at different time points (as mentioned in Fig. 4) after addition of denatured protein at 0°C to the PTC RNA. The duration of each crosslinking reaction was 30 s. This way, the RNA-protein complexes were trapped at different times. At 0°C, the RNA binding and release take about 5 min for the denatured proteins, making it easy to have a cross-linking time course. The UVirradiated samples were then precipitated using salt-ethanol and washed with 70% ethanol.
Mutagenesis of Selective Amino Acids in Protein-Site-directed mutagenesis was done using the QuikChange site-directed mutagenesis kit (Stratagene) on expression vector pET3a containing the full-length coding region of human carbonic anhydrase I (HCA-I) gene cloned at the NdeI and BamHI sites. The specific amino acid mutations mentioned in Fig. 6 and Table 3 were introduced. Each mutation was confirmed by sequencing.
Expression and Purification of Recombinant Carbonic Anhydrase and Its Mutants-E. coli BL21 (DE3) cells transformed with vector pET3a containing mutant or wild type carbonic anhydrase were allowed to grow to an A 600 of 0.45-0.5, and the culture was induced by the addition of isopropyl 1-thio-␤-Dgalactopyranoside (final concentration, 0.4 mM) and ZnSO 4 (final concentration, 0.5 mM). After 4 -5 h of incubation at 30°C with constant shaking, bacterial cells were harvested by centrifugation (5000 ϫ g) for 15 min at 4°C. Cell pellets were washed with buffer containing 25 mM Tris-Cl, pH 7.5 and 50 mM NaCl and resuspended in the required volume of buffer containing 50 mM Tris-HCl, pH 8, 100 mM NaCl, 1 mM DTT, and 0.01 mg/ml DNase I. The cell suspensions were sonicated using the 50% duty cycle in an ice-cold bath (with intermittent cooling) for 5 min, and cell extracts were centrifuged at 10,000 rpm for 20 min at 4°C; pellets were resuspended with an equal amount of the same buffer and used as samples for expression analysis by SDS-PAGE (shown in Fig. 6).
For purification of proteins, we used a protocol reported previously (24). All proteins were purified to apparent homogeneity as observed by a single band for 5 g of protein in SDS-PAGE stained with silver. Purified proteins were stored in storage buffer (10 mM Tris-HCl, pH 7.5, 50 mM NaCl, and 8 mM MgCl 2 ) containing 50% glycerol.
Unfolding and in Vitro Refolding of Proteins-For unfolding and refolding studies, we used a protocol from our laboratory reported previously (18 -20). In short, 6 M guanine hydrochloride was used to unfold protein, and the loss of secondary structure was confirmed by CD (17,20). For refolding, unfolded protein was diluted 100-fold in refolding buffer (50 mM Tris-HCl, pH 7.5, 10 mM Mg(OAc) 2 , and 100 mM NaCl), and the activity of refolded enzyme was assayed by adding para-nitrophenyl acetate to the refolding mixture (at different time points of refolding in the case of time course studies; otherwise kept for 30 min at 25°C before addition of para-nitrophenyl acetate). The increase of A 400 with time was monitored at 25°C.
Energy Calculations for Thermodynamic Stability Prediction-For each of the three proteins (lysozyme, BCA, and ovalbumin) with known crystal structures, we considered the experimentally observed orders of release from PTC as the "correct" order and the other arbitrarily chosen orders as "random." The energies were calculated for the segment of the chain released, assuming the energy of the bound region to be zero. This approach is advantageous because it is not necessary to calculate the energy for the bound region. The energy computation for the bound state is difficult because its crystal structure is unknown. The energy change would then become the energy of the released polypeptide segment: represents the change in energy on release of the bound portion, E f is the energy of the entire system (the bound polypeptide plus the released portion) after release, and E i is the energy of the system prior to the release. Hence, at every "release" time step, the energy of the total released portions is calculated.
Molecular dynamics simulations were carried out using the NAMD and VMD software (25,26). The particle mesh Ewald method was used for the electrostatic interactions, and a cutoff radius of 14 Å was used for truncation of van der Waals interactions. The starting configurations were energy-minimized using 1000 conjugate gradient steps followed by 1-ns equilibration and 1-ns production runs in the NVT (number of particles, volume, and temperature fixed) ensemble. The simulations are carried out at 298 K. The temperature was maintained using a Langevin thermostat. The equations of motion were integrated with a time step of 1 fs.
Data for Hydrophobicity Values-Test proteins were selected from the UniProt repository. Proteins are listed there, for example, in the following manner: P60709 (cytoplasmic actin), O00560 (syntenin), 1Q5Z859 (MAPK4), P04040 (catalase), P36871 (phosphoglucomutase), P26595 (␣ 1 -antiproteinase), Q9CF73 (␥-glutamylphosphate reductase), P37837 (transaldolase DAPK3_HUMAN), P00698 (LYSC_CHICK lysozyme), etc. For each of these proteins, the hydrophobicity values were collected using the tool ExPASy of the Swiss Bioin-formatics Institute. The values are all given for amino acids from N-terminal end of the protein to the C-terminal end. All the values were based on the Kyte and Doolittle scale. The window size was kept at 9. The fitting model was linear. In all cases, unless otherwise stated, the calculations were done from the N-terminal to the C-terminal end of the protein. In the case of the cytosolic proteins, the signal peptide was found for only one protein (lysozyme) and was removed prior to the calculations.

Isolation of Ribosome-bound Full-length Nascent Protein-
The interaction of folding proteins with a set of nucleotides in the PTC has been shown for a number of proteins earlier (19,20). In this report, we show that a nascent protein interacts with the same set of nucleotides in the PTC in vivo. From the mode of ribosome binding and release of folding protein, we try to explain how protein folding takes place on the ribosome.
Ribosome nascent protein complexes have been isolated by a number of groups (27)(28)(29) to study intermediates in translation. Many of them used a few N-terminal amino acids of a protein as the nascent chain (27)(28)(29); others used full-length protein (18,30). We wanted to freeze the ongoing process of translation to trace the nascent protein on the ribosome. A C-terminal histidine-tagged rhizobacterial protein, HspH, was used for this purpose (18). This monomeric bacterial protein was expressed in E. coli. Translation was quickly stopped by chilling (Fig. 1A), the cell population was UV-irradiated as described under "Experimental Procedures," and the entire ribosome population from these cells was applied to the nickel column (18). From the column, a significant amount of the 50 S population and a small amount of the 70 S population co-eluted with the full-length HspH (18) because of cross-linking. The nascent HspH in the ribosomal RNA-bound population was freed by RNase treatment and detected by Western blotting with anti-penta-His antibody. Purified native HspH served as the length standard as shown in Fig. 1B.
To determine whether the cross-linked ribosomes are those that translated the protein or any free non-translating ribosome in the cell associating "in trans" with the nascent full-length protein, we did the following experiment. We selected a 23 S rRNA mutation 2251G3 A that confers translation deficiency (31) but retains protein folding proficiency ( Fig. 2A). This mutation also destroys a BsrI restriction site in the corresponding rDNA (Fig. 2B). It was introduced into a plasmid-borne rrnB operon under the control of the P L promoter. The promoter was kept repressed at 30°C by the chromosomally encoded cI 857 temperature-sensitive repressor. As expected, ribosomes isolated from the cells grown at 42°C were both mutant and WT as evident when 23 S rRNA isolated from total ribosomes was sequenced. The cells carrying mutant 23 S rRNA plasmid were further transformed with plasmid containing the HspH gene, and the transformants, while growing at 42°C, were snap frozen followed by UV cross-linking. Starting with these cells, the ribosomal RNA was purified from the total ribosomes (input to nickel column) as well as from the nickel column-eluted fraction. A small segment of the 23 S rRNA (G2156 to G2454) was amplified by RT-PCR. The PCR products derived from the total ribosome preparation could be digested only partially with BsrI, whereas the products derived from the nickel column-eluted fraction could be digested completely (Fig. 2C). These results indicate that the ribosomes cross-linked to full-length proteins were purely wild type despite the presence of the folding-proficient, translation-deficient mutant ribosomes in the cell. Hence, the nascent protein associates in cis with the ribosome that translates it and not in trans with any folding-proficient free ribosome inside the cell (32).

Protein Folding by Ribosomes
Therefore, we isolated a major population of post-translational 50 S subunit bound to the full-length nascent HspH (Fig.  1A) and a small 70 S population of that had just terminated translation, was still undissociated, and had an HspH C terminus accessible to nickel (18) (Fig. 1A). Some of the 70 S population could also be part of the polysome carrying a ribosome that just completed synthesis of a C-terminal His tag (Fig. 1A) (18).
Interaction of Nascent Protein with the rRNA Nucleotides of PTC-To study the nascent protein-rRNA interaction, the nickel column-purified ribosome population was first gently freed from ribosomal proteins by prolonged incubation with a low concentration of urea in ice. The 23 S rRNA was purified by gel filtration (Fig. 2A); presumably its PTC was cross-linked to the nascent HspH. It should be noted that none of the ribosomal proteins would cross-link with the PTC because they do not interact with it. We confirmed this by doing RT-PCR on the gel-filtered rRNA similarly freed from ribosomal proteins from UV-irradiated non-translating 70 S ribosome using the appropriate PTC-specific primer. A distinct band of the desired length was obtained, indicating that the synthesis of DNA by RT-PCR was not blocked by any ribosomal protein cross-linked to the PTC (Fig. 3A, lane 3). A RT-PCR product of the same length was obtained when 23 S rRNA was purified from non-UV-exposed 70 S ribosome by phenol extraction (Fig. 3A, lane  2) or when PCR was done on the cloned PTC DNA (Fig. 3A,  lane 4). After end labeling, one of these primers was annealed to the RNA of the RNA-nascent protein complexes. This was extended by thermophilic reverse transcriptase at high temperature, and the products of chain extension were run in a sequencing gel. Five distinct stops were detected, and their positions were determined from the sequence of the same region of the RNA run next to them. Interestingly, the positions of these stops fully agreed with those observed in the case of the in vitro folding experiments where chemically unfolded proteins were cross-linked with the in vitro transcribed PTC RNA (19,20) (Fig. 3B). Apart from the five stops (as marked in Fig.  3B), we also found one at A2534, but this was neither seen reproducibly nor identified as an RNA-protein complex in mass analysis that clearly showed the amino acids of the nascent HspH interacting with the five strong binding sites in the PTC in vivo. The MALDI-TOF MS/MS analyses were done on the UV-cross-linked RNA-nascent protein complexes following the procedure reported earlier (19). The result of this study was compared with the in vitro folding counterparts shown in Table  1. For HspH protein, there was complete agreement between the in vivo and in vitro binding sites in the RNA-protein complexes. The amino acids of different proteins that interacted with the same nucleotide in the PTC RNA are not all identical, but they are not random either. In this small set of data, one RNA site (C2551) was recognized by the same amino acid (Asn) from all five proteins tested so far. In all 25 sites of the five proteins, the amino acids interacting with the PTC RNA were combinations of only Lys, Leu, Gln, Asn, and Gly.
Kinetics of PTC RNA-Unfolded Protein Interaction-Once we observed that the same set of PTC nucleotides interacts with the unfolded proteins in vivo and in vitro, we wanted to follow the time course of PTC-unfolded protein interaction to understand the mechanism of protein folding. The time course could not be followed in vivo. We took a number of unfolded proteins in vitro and looked at the time course of their binding and release from different sites of the PTC RNA in vitro. These were BCA, ovalbumin, and lysozyme, all of which showed interaction with the same set of PTC nucleotides described above (19).
Filter binding data using [␣ 32 P]UTP-labeled PTC RNA showed that the binding and release of BCA from this RNA was complete within 100 s at 25°C (Fig. 4A). However, the activity gain of BCA followed its release from the PTC as shown in the Fig. 4A. We trapped the short lived PTC RNA-folding protein complex (19) by UV cross-linking at 0°C at different time points after adding them together. To locate the protein binding sites on the RNA, we did primer extension with radiolabeled primer on those protein-cross-linked PTC RNA samples with thermophilic reverse transcriptase. The intensities of each of the primer extension stops in the autoradiogram changed with time as shown in the Fig. 4, E, F, and G, for the three proteins. For each protein, the experiment was repeated a number of times. The variation of intensity of each block with time was calculated as described in Fig. 4, B, C, and D, for the three proteins. For each protein, the data from three such experiments are plotted in the figures. We used Quantity One software to calculate the band intensities of the reverse transcriptase stops at different sites (19). That the UV cross-linking was not saturated and there was no degradation of the RNA in the process is ensured by the presence of significant band intensity of the reverse transcript at the full-length RNA position in each lane. It is very clear from Fig. 4 that among the five PTC RNA nucleotides (19,20) (Table 1) U2473 and U2491 released the folding protein earlier than the nucleotides C2551, A2560, and A2587. Hence, around 80 amino acids in BCA (serine 1 to aspartic acid 80), 50 in lysozyme (arginine 14 to cystine 64), and 85 in ovalbumin (glycine 1 to aspartic acid 85), all at the N termini, were freed first. The time courses were not the same for all three proteins.
The binding of five specific amino acids and nucleotides could divide the polypeptide into six segments (Fig. 4H), which could form secondary structures independently, at the usual fast rate. The orderly release of each of the nucleotide-amino acid pairs would allow respective secondary structures to fold, in the order of release, to the tertiary level.
Energy Computation for the Pairwise Stability of Released Segments of Protein from the PTC RNA-The pairwise stability of the released segments should indicate whether the observed release order is energetically favorable. The five binding sites (as clamps) divide the polypeptide into six segments. An example is shown in the schematic in Fig. 4H where the clamps are designated by numbers 1, 2, 3, 4, and 5 from the N to the C termini of the proteins, dividing the polypeptide into six segments, A, B, C, D, E, and F. Here the order of release is shown as 2, 3, 4, 1, 5. The released segments would form secondary struc-  (19). In L4 and L1, the PTC RNA region of 23 S rRNA was isolated from cells with nascent HspH UV-crosslinked to it and from non-translating UV-irradiated 70 S ribosome, respectively. OCTOBER 26, 2012 • VOLUME 287 • NUMBER 44

JOURNAL OF BIOLOGICAL CHEMISTRY 37513
tures followed by formation of a tertiary fold in the sequence of their release to give rise to the native conformation. As shown in Table 2, interaction energies between segments in the order of their release were calculated by first calculating the potential energy of a region containing two or more released segments and then subtracting the potential energy of the individual segments contained in the released polypeptide. The energy calculation was done in the presence of water to simulate the intra-cellular condition. The results showed that the released regions in the experimental order were gradually going down to a lower energy level, which corroborates the folding funnel concept. We saw that making the release order arbitrary would make the drop in energy totally random (data not shown). The lowering of energy in the experimental release order has been shown for proteins ovalbumin, lysozyme, and BCA. All of them show a decrease in energy in accordance with the folding funnel landscape.

TABLE 1 Interacting PTC nucleotides and amino acids of proteins
Amino acids of nascent and chemically unfolded proteins interacting with the PTC RNA are shown in bold. The conserved interacting amino acid asparagine (Asn) is shown in red.

Protein Folding by Ribosomes
In the experimental order, the release of the first segment was energetically extremely favorable as shown in Table 2. There could be a slight energy enhancement in the following steps, resembling a local minimum in the folding funnel landscape where the first step is a sharply falling trough; a small increase in energy could cross the activation barrier to be able to avail a subsequent drop down the energy landscape. This appears to be the optimum strategy to get over the local minima.
Sequential Release of Protein from the PTC Reveals Controlled Collapse-In vitro kinetic studies suggest that the regions of proteins associated with U2491-and U2473-interacting amino acids were released first from the PTC RNA.  Table 1) for each of the three proteins, ovalbumin (B and E), BCA (C and F), and lysozyme (D and G), interact with the same five PTC RNA nucleotides, A2587 (black Ⅺ/red Ⅺ/blue Ⅺ), A2560 (black E/red E/blue E), C2551 (black ‚/red ‚/blue ‚), U2491/2492 (black ƒ/red ƒ/blue ƒ), and U2473 (black छ/red छ/blue छ), with time are shown. E, F, and G show blocks of primer extension analysis, which was carried out on the PTC RNA-"refolding ovalbumin," PTC RNA-"refolding BCA," and PTC RNA-"refolding lysozyme" complexes, respectively (19) OCTOBER 26, 2012 • VOLUME 287 • NUMBER 44

JOURNAL OF BIOLOGICAL CHEMISTRY 37515
These regions are the most hydrophilic (Fig. 5, A, D, and G) in the entire protein length and reside at the outer surface of the protein, forming a shell (shown in Fig. 5, B, E, and H). Within these hydrophilic outer surfaces, the relative hydrophobic regions of all proteins that are released later remain stabilized in the interaction network through intervening water molecules. Protein segments associated with the most conserved interacting pair, asparagine and C2551, were released next from the RNA. In the linear amino acid sequences, we found no position specificity for these asparagines, but in three-dimensional conformations, we found the asparagines strategically located at the loop joining a large ␤-sheet with a helix (Fig. 5, C, F, and I).
If the large sheet region is relatively hydrophobic, its collapse may be controlled to ensure cooperative interaction between the ␤-strands to form the sheets rather than an unstructured hydrophobic collapse. We are looking into more proteins to see whether such a "nucleation" for ␤-sheet formation is common in the protein space. Protein segments flanking A2587 and A2560 were released last from the RNA; incidentally, they were the first to bind with the RNA. Hence, these regions, which appeared to be more hydrophobic, are protected first and released last from the RNA; active site amino acids of all proteins tested also lie in between these two sites. This is perhaps the final collapse of the hydrophobic interior that was initiated by the nucleation of ␤-sheet formation in the conserved asparagine-linked region. Collapse of the hydrophobic interior could be associated with the desolvation of water molecules trapped inside the inner surface of the hydrophilic exterior (see Fig. 7). According to a number of biophysical studies, this desolvation energy (33)(34)(35) thermodynamically stabilizes the final native form of the protein.
Mutation of Selective Amino Acids Important for Folding-To see whether the amino acids binding to specific nucleotides in PTC are crucial for folding, we mutated two such amino acids of HCA to check the fate of the protein in vivo. When we mutated the conserved asparagine (that interacts with C2551) to lysine (basic) or glutamic acid (acidic), the protein precipitated (Fig. 6A), but changing it to glutamine (similar to asparagine) did not change its solubility. When we mutated a glutamine (Gln) of HCA interacting with A2560 to glutamic acid (Glu; acidic) the protein again aggregated (Fig. 6B). The results of the contrasting mechanical approach of inserting mutations in active sites or random coiled regions have been compared in Tables 3 and 4 (36,37). We see from Table 4 that of 13 mutations introduced in the random coils only one made HCA insoluble. As shown in Table 3, none of the mutations introduced in the active site in HCA reduced its solubility. However, both mutations introduced in the PTC binding sites made the protein insoluble.
A refolding experiment was carried out with both the wild type and mutant proteins. Mutant proteins having less than 5% activity were not included in the folding experiments. In the absence of ribosomal folding modulators, wild type and each of the mutant proteins could regain 25% of their respective native activity, which increased to ϳ70% in the presence of PTC RNA (Table 3). It should be noted that all of these mutants were soluble proteins, and they did not form inclusion bodies. The activities of native proteins did not change when incubated with the ribosomal components (19).
Ribosome-assisted Folding in Presence of Chaperones-We demonstrated previously that the association of DnaK with the 50 S ribosome increases the rate of folding, although the final yield of refolded protein did not change (17). The observation was similar both in vitro and in vivo when experiments were done using corresponding mutants of E. coli (17). In vitro, binding of unfolded protein to the 50 S ribosome releases DnaK (17), suggesting a conformational change in the PTC region of the 50 S ribosome where unfolded protein interacts. All of these observations probably indicate that the well known ribosomeassociated chaperones like DnaK/DnaJ or trigger factor get access to the nascent protein (folding protein) only after they are released from the rRNA, and the process appears to be slow (6 -9, 14, 22, 23) compared with the ribosomal subunit dissociation in the case of cytosolic globular proteins (18). Exposed hydrophobic patches of chaperones provide a favorable environment for the nascent chain coming out into the cytosol to fold (see "Discussion").

DISCUSSION
We have shown that the nascent protein interacts in vivo with the rRNA nucleotides of the PTC identically to how the chemically unfolded protein interacts with the 23 S rRNA or in vitro transcribed PTC RNA. Earlier we showed that the PTCspecific antibiotics affect the activity gain of full-length nascent protein in the cell identically to how they affect chemically unfolded proteins in vitro in the presence of the above folding modulators, the 23 S rRNA, or transcribed PTC RNA (6 -9, 13, 14, 17-20). Hence, the nucleotides of the PTC have specificity for interaction with the amino acids of proteins. This property has been well preserved as an evolutionary relic in the rRNA sequence of PTC (6 -20). Because of the difficulty in doing in vivo kinetic experiments, we used in vitro transcribed PTC

TABLE 2 Interaction energies calculated within the released protein segments in presence of water
The table indicates the calculated interaction energies for protein segments of BCA, lysozyme, and ovalbumin released from the PTC RNA in the order observed experimentally.

BCA Lysozyme Ovalbumin
Portions released

Energy of system in water
Portions released

Energy of system in water
Portions released RNA and pointed out a controlled collapse of the globular proteins mediated by the five PTC RNA nucleotide stretches. The rRNA nucleotides (A2587, A2560, C2551, U2491, and U2473) that interact with folding proteins are present inside the ribosome in the wall of the proposed peptide exit tunnel (32, 38) (Fig. 7) toward the active site of the PTC from the narrow constriction formed by ribosomal proteins L4 and L22. Among them, A2587, A2560, and C2551 are present at the A loop of the domain V rRNA (39), whereas U2473 and U2491 are present close to the P loop (39). This region of rRNA contains a number of hydrophobic crevices that provide a considerable hydrophobic surface in this part of the tunnel (Fig. 7). Among them, one is at the PTC active site (A site crevice) formed by bases A2451 and A2452 and the top of the base pair between A2453 and U2500 (40). Two others are relatively buried toward the narrow constriction of the tunnel, one of which is formed by the bases of A2058 and A2059 and the top of the base pair between G2057 and C2611 (40), whereas the other is formed by domain II nucleotides U790 and A752 and domain IV nucleotides C1781 and U1782 (41). The remaining part of the peptide exit tunnel  after the narrow constriction toward the exit site appears to be more hydrophilic (42) where the nascent protein first encounters the cytosolic environment. As reported in the experiments above, the PTC RNA-mediated controlled collapse of three globular proteins takes place more or less sequentially from N to C termini of the proteins (43,44). Amino acids of proteins that constitute the hydrophilic exterior in their native structure are present at the N terminus in all of them. Following this observation, we looked at more than 100 unrelated cytosolic single domain proteins and calculated for each of them the average of the algebraic sum of hydrophobic indices for every 10% of the amino acids from the N to C terminus. If we assume that a polypeptide folds directionally from the N to C terminus, then we can draw a cumulative plot for more than 100 proteins as shown in Fig. 8. The first point in this plot is the average of the hydrophobic indices for the first 10% of the amino acids; the second point is the same for the first 20% of the amino acids, and so on. This is in fact a three-dimensional feature of a protein presented on a linear scale. It is continuous and appears to be exponential, starting from being strongly hydrophilic at the N terminus to fairly hydrophobic at the C terminus. We took the average hydrophobicity for 10% because we found it to be the optimal length (through trials from a segment length of one to higher numbers of amino acids) of segment that gave remarkable agreement in the cumulative plot for more than a hundred proteins. A plot with the window shift of a single amino acid was not used because a number of amino acids of proteins make secondary structures that superfold to the tertiary level. The tertiary folding is not through sequential interaction of individual amino acids from N to C termini of proteins. The identified rRNA nucleotides important for folding temporally divide the proteins into segments with characteristic physicochemical nature (45). This segmentation seems to be identical for the nascent chain that is being synthesized as well as for chemically unfolded protein containing minimal secondary structure (20,39). The kinetics of folding of the polypeptide chain in the tunnel appears to be slow and postsynthetic as reported by a number of investigators (22,23). It has been reported by Brimacombe and co-worker (11) that the N terminus of the nascent peptide could touch a number of nucleotides in the PTC and finally get back to the PTC before going out of the tunnel through domains IV, III, II, and I. We showed previously (18 -20) that no other domain of the 23 S rRNA is actively involved in the folding of nascent and chemically unfolded proteins. The folding process requires that the PTC is accessible and cannot be completed during the polypeptide synthesis. We (6 -9, 14, 22) and Hartl and co-workers (23) observed a long post-synthetic delay in the folding (activity gain) of ␤-galactosidase. A large population of the 50 S subunitbound full-length nascent protein (18) in fact indicates the slow release of nascent protein through the 50 S subunit following post-translational dissociation of 70 S ribosome. The rate of translation, its termination, and ribosomal subunit splitting appear to be much faster, suggesting a "recycling" of the 50 S subunit by the full-length nascent protein. These reports can be incorporated in the following kinetic model of folding. From the concurrence of the data on in vitro and in vivo interactions between the full-length nascent and chemically unfolded proteins with the PTC nucleotides, we see that in the progressing polypeptide nucleations (47,48) could take place at about five or more sites, which are all in the random coil regions of the polypeptide chains. The nascent protein chain is divided into segments that are released sequentially, thereby enormously reducing the topological problems of folding. Based on the experiments mentioned above in which we and others introduced mutations in these putative nucleation sites and elsewhere in the protein carbonic anhydrase (human), the strength of this nucleation appears to check its fate in vivo. The mutations at nucleation sites inhibited folding more strongly, forming inclusion bodies, than mutations elsewhere (Tables 3 and  4). Once nucleation is achieved, the rest of the folding should depend on the environment the polypeptide encounters in the long exit tunnel. The hydrophobicity/hydrophilicity of the tunnel are described above. The intramolecular interaction within the moving hydrophilic N-terminal head of the polypeptide (11) would only be restricted in the hydrophobic region from  the PTC toward the tunnel constriction, and the hydrophobic C terminus would be more or less structureless. Because we know that there is a reverse gradient of hydrophilicity from the N to the C terminus, the increasing water content in the tunnel at the exit site should favor the release of the N terminus through the interaction with its hydrophilic regions. As shown in Fig. 4, the N terminus of the nascent chain interacts with the rRNA nucleotides U2473 and U2491/2492 before proceeding toward the tunnel interior. It probably helps to build the optimal secondary conformation in the linear protein sequence (␣-helix or ␤-strand compatible with the dynamic nature of the 10 -20-Å diameter tunnel) needed to release the hydrophilic exterior of the globular protein first into the aqueous environment of cytosol through the tunnel exit (Fig. 7). As soon as the hydrophilic region of the protein finds the hydrophilic environment of folding near the tunnel exit, it becomes structured as seen on the outside of the globular protein crystals. Once that is over, the unfolded hydrophobic length moves sequentially unobstructed through the constriction. Our study shows that the bulk of the hydrophobic region of the protein remains bounded by the amino acids in random coils that interact with the A2560 and A2587 nucleotides (Fig. 4). The hydrophobic crevices of the tunnel near the PTC add an extra advantage to maintain that conformation in the nascent chain (41). This only can ensure the unperturbed and gradual intralength interaction required for example for the formation of ␤-sheets covering a large length of the proteins, which we have seen here. Therefore, when the protein enters the hydrophilic environment at the tunnel exit past the narrow constriction, it collapses into a hydrophobic globule that grows until the entire hydrophobic stretch enters the tunnel exit. From the crystal structures of our test proteins, we see that the amino acids in the hydrophobic globes are arranged from the center of the globes to radially outwards. This globe is eventually cradled inside the hydrophilic exterior surface of the globular proteins. This controlled collapse is associated with the desolvation in the internal surface of the hydrophilic exterior (33)(34)(35) to thermodynamically stabilize the final native structure of the protein. However, the final tuning in structure takes place outside the ribosome. The last released hydrophobic regions of polypeptides are probably tethered by their interaction with exposed hydrophobic patches of the ribosome-associated molecular chaperones like trigger factor, DnaK/DnaJ, etc. in the aqueous cytosolic environment. The effects of these interactions are seen on proteins whose activity may increase or decrease depending on the outcome of interaction with the chaperones. At the end, we must mention that the order of release of peptide segments of proteins from the PTC was not very distinct for sites having similar band intensities. But overall, the group of bands disappearing early (early release) and late (late release), which constituted the outer and inner parts, respectively, of the three-dimensional conformation of proteins, could be identified unambiguously.
Therefore, we demonstrate that the information for correct folding of a protein is inherent in its linear amino acid sequence as proposed by Anfinsen (49). Moreover, the nucleating interactions of nascent protein with the PTC nucleotides of rRNA address a solution for the concerns expressed in Levinthal's paradox (46). We are in the process of deciphering common  motifs in the tertiary structures of proteins that will strengthen our claim that all proteins have a similar gradient of charge distribution of amino acids (hydrophobicity) from N to C termini.