Genetic Code-guided Protein Synthesis and Folding in Escherichia coli *

Background: Synonymous codon usage affects protein properties in a given organism. Results: A total of 342 antibody codon variants were identified, differing significantly in solubility and functionality while retaining the identical original amino acid sequence. Conclusion: Genetic codes control protein synthesis and folding. “Codon-preferred” DNA template(s) can be generated by functional screening. Significance: Protein properties can be considerably altered by synonymous codons without substituting amino acids. Universal genetic codes are degenerated with 61 codons specifying 20 amino acids, thus creating synonymous codons for a single amino acid. Synonymous codons have been shown to affect protein properties in a given organism. To address this issue and explore how Escherichia coli selects its “codon-preferred” DNA template(s) for synthesis of proteins with required properties, we have designed synonymous codon libraries based on an antibody (scFv) sequence and carried out bacterial expression and screening for variants with altered properties. As a result, 342 codon variants have been identified, differing significantly in protein solubility and functionality while retaining the identical original amino acid sequence. The soluble expression level varied from completely insoluble aggregates to a soluble yield of ∼2.5 mg/liter, whereas the antigen-binding activity changed from no binding at all to a binding affinity of > 10−8 m. Not only does our work demonstrate the involvement of genetic codes in regulating protein synthesis and folding but it also provides a novel screening strategy for producing improved proteins without the need to substitute amino acids.

Universal genetic codes are degenerated with 61 codons specifying 20 amino acids, thus creating synonymous codons for a single amino acid. Synonymous codons have been shown to affect protein properties in a given organism. To address this issue and explore how Escherichia coli selects its "codon-preferred" DNA template(s) for synthesis of proteins with required properties, we have designed synonymous codon libraries based on an antibody (scFv) sequence and carried out bacterial expression and screening for variants with altered properties. As a result, 342 codon variants have been identified, differing significantly in protein solubility and functionality while retaining the identical original amino acid sequence. The soluble expression level varied from completely insoluble aggregates to a soluble yield of ϳ2.5 mg/liter, whereas the antigen-binding activity changed from no binding at all to a binding affinity of > 10 ؊8 M. Not only does our work demonstrate the involvement of genetic codes in regulating protein synthesis and folding but it also provides a novel screening strategy for producing improved proteins without the need to substitute amino acids.
Proteins are encoded by genetic codes stored in DNA. The ribosome, the "protein synthesis machinery," deciphers codons aligned along mRNA to synthesize a specific polypeptide, which then folds into a defined structure/conformation (1). It has long been believed that the amino acid sequence contains all of the essential information required for folding the protein into a specific three-dimensional structure under an appropriate condition despite the detail remains unknown (1). Recent studies have revealed a co-translational protein folding mechanism in which the nascent polypeptide starts to fold immediately after it emerges from the ribosomal tunnel and subse-quently folds into a final state with the assistance of molecular chaperones and the ribosome itself (2,3).
There are 61 universal codons encoding for 20 amino acids, hence offering cells the flexibility to select synonymous codons for translating a polypeptide (4). Interestingly, synonymous codons are non-randomly distributed along genes and the frequency with which an individual synonymous codon is chosen to specify a protein exhibits a "favored" usage bias in a given organism (5). Moreover, codon usage bias varies significantly between different organisms (6) and attempts at producing proteins in heterologous cells often result in a poor synthesis or formation of insoluble aggregates. By prediction, Ͼ40% of human genes would not express or express very poorly when transformed into Escherichia coli (5). However, by mimicking E. coli codon usage bias, synonymous substitutions of foreign genes ("codon harmonization") improved protein synthesis despite a varying success and unpredictability by this approach (7).
Synonymous codons also influence the function of proteins. Naturally occurring silent mutants have been discovered affecting protein folding (8), altering substrate recognitions (9) as well as triggering various diseases (10), suggesting that synonymous codons might control the folding of nascent polypeptides emerging from ribosome by regulating polypeptide elongation rates (4,11). Indeed, mRNAs composed of different synonymous codons were translated at different rates (12,13) and the choice of rare codons, the availability of corresponding tRNAs, and adjacent codon pairs was shown to slow down protein synthesis on ribosomes (14). Moreover, identical DNA sequences could generate polypeptides with different secondary structures when the translation speed was altered (11). However, a recent investigation using ribosome density profiling technology suggested that the presence of codons with rare tRNAs did not decrease the translation rates (15). A separate study also indicated that tRNA gene numbers were not completely responsible for the codon usage bias in certain amino acids (16).
Thus, it is clear that genetic codes contain additional information beyond amino acid sequences. But how DNA sequences govern the protein synthesis is not clear. To address this issue and investigate how E. coli select its codon-preferred DNA template(s) for synthesis of proteins with required properties, we have designed synonymous codon libraries based on a human single-chain anti-IgE antibody (scFv) 2 template (17). Using bacterial expression, functional screening and DNA sequencing, we have identified 342 codon variants differing significantly in protein solubility and antigen-binding activity. To our knowledge, this is the first time that such a comprehensive study of synonymous codon effects on protein properties is carried out in E. coli.

EXPERIMENTAL PROCEDURES
Library Design and Construction-A human anti-IgE scFv composed of 258 amino acids, including a flexible linker (GGGS) 4 and a C-terminal His 6 tag, was used as the template (17) to construct the synonymous codon libraries. The E. coli B Codon Usage Database was chosen as a guide to design the codon mix that contained high, medium, and low usage frequencies (Table 1). Oligonucleotides with mixed nucleotides at the third position for each codon were synthesized chemically by Invitrogen. 18 oligonucleotides were made for the V H library and 17 oligonucleotides for the V L library ( Table 2). They were assembled by PCR using Pfu DNA polymerase (Promega) as follows: an initial PCR was carried out for 20 cycles (94°C for 30 s, 55°C for 30 s, and 68°C for 5 min) followed by second PCR for a further 25 cycles (94°C for 30 s, 55°C for 30 s, and 68°C for 1 min) after adding the upstream and downstream flanking primers. A final extension was carried out at 68°C for 10 min.
E. coli Cloning, scFv Expression, and Preparation-PCR products were digested by restriction enzymes NdeI and EcoRI and ligated into the plasmid pET22b (Novagen). E. coli BL21(DE3) (Novagen) was transformed for periplasmic expression of the scFv. For statistical analysis purposes, four duplicated clones from each variant were grown in 2ϫYT medium containing 50 g/ml carbenicillin at 37°C overnight. The overnight culture was then diluted in 1:20 and continued to grow at 37°C for further 2 h prior to the addition of isopropyl 1-thio-␤-D-galactopyranoside at a final concentration of 1 mM. After the induction at 30°C for 3 h, A 600 of the culture was measured and adjusted by dilution to generate equal number of E. coli cells in each sample before micro-centrifugation at 13,000 rpm, 4°C for 1 min. To extract soluble scFv, the bacteria were treated with Bugbuster TM (Novagen) containing 1 g/ml DNase I on ice for 1 h followed by a high speed centrifugation at 13,000 rpm for 15 min at 4°C. Any insoluble scFv in the pellet was solubilized using 10% Sarkosyl (18) and centrifuged at 13,000 rpm for 15 min. 4°C. The total scFv expression was obtained by adding the sarkosyl-solubilized scFv and Bugbuster-extracted scFv together. To purify soluble scFv for ELISA analysis, nickel-agarose (Novagen) was used as described (17). For UV CD analysis, scFv was first purified by protein L-agarose (Captol L, GE Healthcare) following the manufacturer's instructions and then by nickel-agarose purification as described (17).
Analysis of scFv Solubility and Antigen Binding-scFv solubility was analyzed by a sandwich ELISA in which microtiter wells were coated with purified rabbit polyclonal antibodies against the IgE scFv at 1 g/well, 4°C overnight. After blocking with 1% BSA for 2 h, the wells were added with either purified or non-purified soluble scFv (100 l/well in duplicates) and incubated at 30°C for 1 h. After three washes, Sigma HRPcoupled monoclonal anti-His 6 antibody (1:6000) was added (100 l/well), and the incubation was continued for 1 h at 37°C. The HRP activity was then developed by addition of 100 l of 3,3Ј,5,5Ј-tetramethylbenzidine (TMB) liquid substrate (Sigma) for 5-10 min at 37°C. Finally, the reaction was stopped with 100 l of 1 N HCl, and the microliter wells were read at 450 nm.
The antigen-binding activity of the scFv was examined by a similar procedure as described above except the microtiter wells were coated with the antigen IgE at 0.1 g/well. In brief, after 1% BSA blocking, the wells were added with scFv followed by the detection using the Sigma HRP-coupled monoclonal anti-His 6 antibody. PBS alone and extracts of bacteria without the scFv plasmid were used as negative controls in each experiment.
The relative solubility and antigen-binding activity of individual scFv was calculated through a side-by-side comparison with the original scFv on the same microtiter plate using the following formula: (A 450 of a variant Ϫ A 450 of the negative control)/(A 450 of the original scFv Ϫ A 450 of the negative control) ϫ dilution factor ϫ 100%.
UV CD Analysis-UV CD measurements were performed using a JASCO J-810 spectropolarimeter calibrated with ammonium D-10-camphorsulfonate. Far-UV CD analysis used the following parameters: cell length, 0.1 cm; bandwidth, 1.0 nm; 2 The abbreviation used is: scFv, single-chain antibody fragment.

Effects of Synonymous Codons on Protein Properties in E. coli
response time, 8 s; scanning speed, 50 nm/min; measurement range, 260 -190 nm; For the signal to noise ratio, each spectrum was the average of four scans. Structure analysis was done by using the CDSSTR software (DichroWeb). Protein concentration for the scans was 0.1 mg/ml in a buffer of 50 mM sodium phosphate (pH 7.2). Near-UV CD was measured as follows: cell length, 1 cm; bandwidth, 1.0 nm; response time, 2 s; scanning speed, 20 nm/min; measurement range, 320 -250 nm; each spectrum was the average of four scans. Protein concentration was 0.5 mg/ml in a buffer of 50 mM sodium phosphate (pH 7.2).
Statistical Analyses-Statistical analysis was carried out using IBM Statistical Product and Service Solutions (SPSS) 19. Data were collected as means Ϯ S.D. The analysis of variance was used to analyze the variation between samples.
Bioinformatics Analysis-Codon Adaptation Index and GC3 analysis was carried out using the software Codon W (Source-Forge), and the codon bias usage database was also used. tRNA adaptation index analysis was based on the genomic tRNA database (tRNAscan-SE Genomic tRNA Database). Free energy prediction of 5Ј mRNA secondary structure was by the program from the University of Rochester Medical Center.

Design and Construction of Synonymous Codon Libraries-
We have designed the synonymous codon libraries based on a single-chain anti-IgE (scFv) template and substituted every codon with degenerated oligonucleotides at the third position (Table 1). Because the substitution of all codons across the full-length DNA would yield a library size beyond the limitation of E. coli transformation efficiency, we constructed two sub-libraries, one for the heavy chain (V H ) and one for the light chain (V L ). To reduce the library size further, each codon was only replaced by a high, medium, and low usage frequency codon, rather than using all possible codons (Table 1). PCR was used to introduce the designed synonymous codon mix into the scFv template. A His 6 tag was engineered at the C terminus for scFv detection and purification. To verify the generation of synonymous codon libraries, we directly sequenced the PCR constructs before E. coli cloning, showing successful substitutions of individual codons at the third position for the V H library (Fig. 1).
E. coli Expression and Screening for Altered Protein Synthesis-Both V H and V L synonymous codon libraries were cloned  OCTOBER 25, 2013 • VOLUME 288 • NUMBER 43 into E. coli. DNA sequencing was carried out, leading to the identification of 342 variants containing synonymous codons at various positions while maintaining the identical, original amino acid sequence. These clones were expressed, and the synthesized scFv was confirmed by Western blotting (data not shown). Soluble scFv was extracted and subjected to a sandwich ELISA assay in which the soluble scFv was captured on the well by precoated rabbit polyclonal antibodies against the scFv. With the His 6 tag at the C terminus, the captured scFv was detected by a HRP-linked anti-His 6 antibody. The relative soluble level of scFv was then measured through a side-by-side comparison with the original clone, which generally yielded the scFv at the level of ϳ1 mg/liter (17). Our results revealed a huge difference in protein solubility among the variants, ranging from completely insoluble aggregates to a soluble yield of ϳ2.5 mg/liter ( Fig. 2 and supplemental Table S1). To examine whether synonymous codons influence total protein expression, the insoluble scFv from the pellet was also monitored by solubilizing the inclusion bodies with 10% sarkosyl (18) followed by sandwich ELISA analysis. This showed that the total scFv expression (insoluble scFv ϩ soluble scFv) also varied between the variants (supplemental Table S1). Western blotting also confirmed the scFv expression with an agreement with the ELISA results (data not shown). However, there was no correlation between the total scFv expression and its soluble production level (supplemental Table S1).

Effects of Synonymous Codons on Protein Properties in E. coli
We have also analyzed the mRNA by RT-PCR using randomly selected 20 clones (10 from V H library and 10 from V L library), detecting a very similar amount of mRNA among the clones despite their apparent differences in protein expression (data not shown). Our result suggests that mRNA level was not the major factor affecting the scFv expression.

E. coli Expression and Screening for Altered Functionality-
The antigen-binding activity of the scFv was examined by ELISA on wells coated with the antigen IgE. After binding, the bound scFv was detected by the HRP-linked anti-His 6 antibody. Similarly, the relative affinity of the scFv was measured by a comparison with the original scFv on the same microtiter plate. This showed that the IgE-binding activity differed from no binding at all to an affinity greater than the original scFv (10 Ϫ8 M) (Fig. 2 and supplemental Table S1) (17). To further characterize the binding activity, four variants, together with the original scFv, were expressed and purified using a His-tagged affinity column. The purified scFvs were then adjusted by dilution to the same level, based on their sandwich ELISA, and used for antigen-binding assays. Fig. 3A shows that while two clones had no binding at all, other two bound to the antigen with affinities greater than the original scFv by ϳ32and 8-fold, respectively. This experiment has been repeated and similar results were obtained (data not shown). The sequences of the four variants were aligned with the original scFv, detecting no any positions contributing to the alteration of the binding activity (Fig. 3B).
Relationship between scFv Solubility and Functionality-We noticed that the scFv solubility was not related to its antigenbinding activity (supplemental Table S1). Although variants of highly soluble and active variants were obtained, many clones showed a high solubility but a low affinity or vice versa (Fig. 2 and supplemental Table S1). Notably, most substitutions were deleterious when compared with the original sequence (Fig. 2); however, in general, substitutions at V H region seemed affecting the antigen-binding activity ( Fig. 2A), whereas V L variants mainly influenced the solubility and decreased the affinity (Fig.  2B). A number of variants with enhanced solubility and antigen-binding affinity over the original scFv were identified from both V H and V L libraries.
Bioinformatics Analysis-Sequence alignment of the 342 codon variants was carried out to detect any possible associations between synonymous codons and their corresponding properties. Unfortunately, this did not identify any regions or individual positions that contributed to the altered solubility and functionality of the scFv (Fig. 3B, data not shown). Substitutions at either V H or V L region could increase or decrease the solubility and affinity of the scFv (supplemental Table S1). We also analyzed the codon adaptation indices such as codon adaptation index, tRNA adaptation index, and GC3, which have been developed to define the relative "adaptiveness" of a codon and used to predict protein expression from DNA sequences (19). Again, we did not detect their correlation with the protein properties, although there may be a weak association between GC3 and expression level in the V L library (data not shown).
CD Spectroscopy-To assess the conformation/folding of variants, CD analysis was performed by comparing the original scFv with affinity-altered variants (Fig. 4). Far-UV CD analysis has showed that all the selected scFvs exhibited distinct CD spectra with a negative band between 216 -220 nm and a positive band between 195-200 nm, which are characteristic of ␤-strand contents. However, they varied in intensity at the peak ϳ 220 nm and/or ϳ 195 nm, indicating the protein secondary structure was slightly altered by the synonymous mutations (Fig. 4A). With CDSSTR software available, we were able to calculate the fraction of changes in the structure for each individual variant (Fig. 4A). Near-UV CD was also   OCTOBER 25, 2013 • VOLUME 288 • NUMBER 43 carried out to analyze the environmental change of aromatic residues on the protein surface, such as tryptophan (Trp) at 288 -293 nm, tyrosine (Tyr) at 275-282 nm, and phenylalanine (Phe) at 255-278 nm. Fig. 4B shows the near-UV CD spectra, revealing clearly the variation at ϳ284 and 291 nm, which was most likely from the contribution of Tyr and Trp located within or adjacent to CDRs (Figs. 3B and 4B). Our results indeed provide the structural explanation for the altered affinity by the synonymous variants.

Effects of Synonymous Codons on Protein Properties in E. coli
Interestingly, variants with no antigen-binding activity or a very low affinity could not be purified by protein L-agarose (data not shown). Perhaps these variants were unstable and formed aggregates on the column during the purification or they contained an altered conformation not recognized by protein L.

DISCUSSION
We have shown that a scFv encoded by different synonymous codons could be synthesized in E. coli with considerably changed solubility and antigen-binding activity while retaining the identical, original amino acid sequence. The significant difference in the antibody affinity (Figs. 2 and 3 and supplemental Table S1) clearly suggests a structural/conformational alteration caused by synonymous codons. Our CD analysis from affinity-improved variants also detected the structural/conformational changes (Fig. 4), directly demonstrating that synthesis and folding of proteins in E. coli is indeed controlled by genetic codes. Our results also suggest that production of heterologous proteins in E. coli for structural and functional studies should be carried out with caution because foreign proteins may not be faithfully regenerated in bacteria. Because the scFv was designed to be secreted into the periplasm of E. coli in a structurally loosely folded state (20), our results favor the previous proposal that synonymous codons only affect the encoded protein at the level of secondary structure (11).
It was observed that codons with a high-frequency usage were mainly located at structural regions, whereas rare codons were more likely for ␤-strand, random coil and domain boundaries (21). Studies on rare codon distribution have also suggested the conservation between structurally related proteins from different organisms (8). However, our sequence analysis from 342 codon variants did not detect any individual regions or positions contributing to the altered protein properties (Fig.  3B and supplemental Table S1), suggesting multiple codons across the entire sequence may act in a synergistic manner. Recently, GFP synthesis was significantly improved by synonymous substitutions of its N-terminal 40 nucleotides that were proposed to form a specific 5Ј-mRNA secondary structure favoring the translation initiation (22). However, our results clearly showed that an identical N-terminal sequence could produce the protein either at a high level or a low yield, depending on the codon distribution at the downstream sequence (Fig.  3B and supplemental Table S1, the V L library), suggesting that the N-terminal sequence is not the sole or indispensable factor affecting protein expression as observed by others (23). Moreover, by using the same method as described in the GFP work (22), we could not detect any ⌬G association of 5Ј-mRNA with the scFv expression and function (Fig. 5). Also, our codon adap-tation index, tRNA adaptation index, and GC3 analyses failed to show any clear correlation between the codon adaptation indices and the scFv properties. Taken together, our work suggests that the effects of synonymous codon on protein properties cannot be accurately predicted, and attempts at expressing heterologous proteins in E. coli using a computational design remain a challenge.
Our strategy of screening synonymous codon libraries offers a powerful novel tool for identification of "codon-modified" DNA template(s) for synthesis of required variants. In this approach, various synonymous codons are combined along the DNA template to allow E. coli to choose its "preferred" template(s) for translating the protein with enhanced properties. The desirable variants can then be identified through screening of the required function/properties. Our successful discovery of enhanced codon variants, in particular the increase in scFv affinity by 32-fold (Fig. 3A), verifies the feasibility of engineering proteins without the necessity of substituting amino acids. We have successfully applied this approach to a number of other proteins, highlighting its general applicability (data not shown). It remains to be tested whether synonymous codons can affect antibody specificity and stability. We envisage that the power of this method can be further enhanced if a highthroughput screening technique (e.g. a display method or a reporter gene) is combined or the library diversity is further enlarged to contain all of the possible synonymous codons.
Modulating protein synthesis by genetic codes may have biological importance such as providing an additional control over protein synthesis. Recently, codon usage bias has been observed to influence cell cycle development (24), the responses of the cell to stress-specific conditions (25,26), and protein phosphorylation profile and stability (27). In addition, it was discovered that silent mutations could cause serious diseases (10) or affect frameshift in a given organism (28). Our strategy may be applied to address these issues and study disease genes.