Tracing Primordial Protein Evolution through Structurally Guided Stepwise Segment Elongation*

Background: Evolutionary protein design provides a deeper understanding of how primordial proteins emerged. Results: An evolutionary model is proposed on the basis of structurally guided stepwise segment elongation. Conclusion: The structural guidance facilitates structural organization and gain-of-function of a generated 25-residue artificial protein. Significance: This study provides insights into how primordial protein evolution may have been promoted by structural guidance. The understanding of how primordial proteins emerged has been a fundamental and longstanding issue in biology and biochemistry. For a better understanding of primordial protein evolution, we synthesized an artificial protein on the basis of an evolutionary hypothesis, segment-based elongation starting from an autonomously foldable short peptide. A 10-residue protein, chignolin, the smallest foldable polypeptide ever reported, was used as a structural support to facilitate higher structural organization and gain-of-function in the development of an artificial protein. Repetitive cycles of segment elongation and subsequent phage display selection successfully produced a 25-residue protein, termed AF.2A1, with nanomolar affinity against the Fc region of immunoglobulin G. AF.2A1 shows exquisite molecular recognition ability such that it can distinguish conformational differences of the same molecule. The structure determined by NMR measurements demonstrated that AF.2A1 forms a globular protein-like conformation with the chignolin-derived β-hairpin and a tryptophan-mediated hydrophobic core. Using sequence analysis and a mutation study, we discovered that the structural organization and gain-of-function emerged from the vicinity of the chignolin segment, revealing that the structural support served as the core in both structural and functional development. Here, we propose an evolutionary model for primordial proteins in which a foldable segment serves as the evolving core to facilitate structural and functional evolution. This study provides insights into primordial protein evolution and also presents a novel methodology for designing small sized proteins useful for industrial and pharmaceutical applications.

A fundamental question in biology and biochemistry is how primordial proteins emerged and evolved into modern proteins with their own functions and structures (1)(2)(3)(4). Attempts to design artificial proteins provide the basis for a deeper understanding of the principles of primordial protein evolution (5)(6)(7)(8)(9). Designing a protein de novo, however, has remained challenging even for small proteins. This is due to the enormous diversity in sequence space of a protein, which makes it impractical to test all possible sequences by means of any conventional method. This holds true in the case of primordial protein evolution because such proteins emerged from prebiotic circumstances where modern biological systems did not exist (3). Thus, it is reasonable to suppose that primordial proteins must have efficiently evolved in a way that promoted their evolution. Examining reproducible mechanisms using an artificial protein would provide us better understanding of the emergence of primordial proteins.
The vast complexity of the sequence space implies that a long, functional polypeptide has never emerged by chance. It must have evolved from smaller molecules along particular pathways that can be described by a physical inevitability. Segment-based protein evolution, proposed in the context of exon shuffling (10,11), hypothesizes that short peptide segments were assembled into single polypeptides and then evolved into modern proteins with their complex structures and functions. This hypothesis, although plausible, does not have enough empirical evidence to explain primordial protein evolution. In general, a short peptide segment is too flexible to fold into a specific structure, making folding and/or associating with other peptide segments energetically unfavorable.
Overcoming the inherent flexibility of peptides is indispensable for efficient segment-based evolution. The association between flexible peptide segments and subsequent folding requires the compensation of considerable entropic cost. To drive the structural organization, a small structural support such as metal coordination (7)(8)(9) may have been involved in primordial proteins. We hypothesized that a foldable short peptide segment would also be capable of serving as a structural support to promote segment-based protein evolution. In contrast to those studies using metal-coordinated motifs, however, there have been few successful reports on segment-based de novo protein generation without relying on metal coordination. The lack of these trials is due to the scarcity of such short foldable peptide segments. Consequently, very little information is available concerning potential evolutionary mechanisms driven by foldable peptide segments.
To better understand the principles of primordial protein evolution, we propose a new evolutionary hypothesis, structurally guided stepwise segment elongation with a foldable short peptide segment. We previously reported on a 10-residue miniprotein termed chignolin (12,13) that autonomously folds into a rigid ␤-hairpin structure, whose folding mechanism has been thoroughly verified by NMR (12), x-ray crystallography (13), and molecular dynamics simulation (14 -20). This foldable chignolin seems like an ideal molecule to serve as a structural support in our evolutionary hypothesis. Using this chignolin as a structural support, we therefore attempted to synthesize an artificial protein by means of repetitive cycles of segment elongation and subsequent functional selection from T7 phage-displayed libraries. Rigorous functional and structural analyses revealed that the foldable chignolin served as the core for structural organization and gain-of-function. We discuss the mechanism of segment-based evolution guided by a structural support and its usefulness as a practical methodology for functional protein design.

EXPERIMENTAL PROCEDURES
Materials-The Fc region of human immunoglobulin G (IgG) used for phage display selection was purchased from Jackson ImmunoResearch. In addition to this commercially available Fc region, highly purified Fc of human monoclonal IgG (Chugai Pharmaceutical Co., Ltd.) was prepared by using a Fab preparation kit (Pierce) followed by anion exchange chromatography with DEAE-Sepharose (GE Healthcare) and size exclusion chromatography with a Superdex 200-pg column (GE Healthcare). Synthesized peptides were purchased from Bio-Synthesis, Inc. Synthesized Oligo DNAs were purchased from RIKAKEN (Japan).
Library Preparation and Phage Display Selection-The molecular evolution was carried out in two steps. The first generation library was constructed by elongating a randomized sequence of eight residues Xaa 8 at the C terminus of a chignolin-derived segment termed CLN. Here, the Xaa was encoded by a degenerate codon NNK (where N represents equal molar amounts of A, C, G, and T, and K is equal amounts of G and T). The gene fragments were synthesized by overlap extension polymerase chain reaction (PCR), digested by restriction enzymes EcoRI/HindIII, and then ligated into the C-terminal part of a T7 phage coat protein gene 10 (g10) of the T7Select 10 -3b vector (Novagen). In vitro packaging and amplification of phages were carried out in accordance with the T7Select TM system manual (Novagen). The phages were precipitated with polyethylene glycol (M r 8000) and NaCl, collected by centrifugation (14,000 ϫ g, 20 min), and resuspended in 1 ml of Trisbuffered saline containing 0.1% (w/v) Tween 20 (TBS-T). The phage library was biopanned against a biotinylated human Fc region (Jackson ImmunoResearch) immobilized on streptavidin MagneSphere paramagnetic particles (Promega). After washing 10 times with TBS-T, bound phages were eluted with TBS buffer containing 1% (w/v) SDS. The eluted phages were amplified, followed by the next round of biopanning. After the final round of biopanning, 96 clones were randomly isolated and assayed by ELISA (see below). The clone identified as the highest affinity clone from ELISA and sequential analysis was used for the second generation library.
The second generation library was constructed by elongating Xaa 8 at the N terminus of the clone selected from the first generation library. Gene fragments encoding the polypeptide library were inserted into g10 of the T7Select 10-1b vector (Novagen). Biopanning was carried out as described above.
Enzyme-linked Immunosorbent Assay (ELISA)-A 96-well microtiter plate, MEDISORP 96 (Nunc), was coated with T7 phages displaying each polypeptide by incubating for 1 h at room temperature. After washing five times with TBS-T, binding to the Fc region was detected with Sunrise R (TECAN) by using a horseradish peroxidase-conjugated human Fc region (Jackson ImmunoResearch) and ABTS One-Component HRP Microwell Substrate (BioFX).
Surface Plasmon Resonance (SPR) 2 Assay-SPR assays were performed with a Biacore T100 (GE Healthcare). 1200 resonance units (RU) of human Fc region (Jackson Immuno-Research) was immobilized on the Series S sensor chip CM5 (GE Healthcare) via primary amine groups using N-hydroxysuccinimide and 1-ethyl-3-[3-dimethylaminopropyl]carbodiimide hydrochloride. For a reference cell, another flow cell was blocked with 1 M ethanolamine. Binding assays were performed under HBS-T (10 mM HEPES (pH 7.4), 150 mM NaCl, 0.05% (w/v) Tween 20). The binding data were fitted to a 1:1 binding model to determine an equilibrium dissociation constant (K D ) by using Biacore T100 evaluation software (GE Healthcare). The affinity data were fitted with the nonlinear van't Hoff equation (Equation 1) (21) by using EGOR Pro (WaveMetrics, Inc.).
where R is the gas constant; T is the temperature; T 0 is an arbitrary reference temperature (e.g. 298.15 K); ⌬H and ⌬S are the binding enthalpy and entropy at T 0 ; and ⌬C p is the change in heat capacity.
For the binding to a non-native conformer of IgG, acidtreated IgG was prepared by dialyzing human monoclonal IgG against 20 mM sodium acetate (pH 4.5) at a temperature of 323 K. Heat-treated IgG was prepared at a temperature of 343 K for 15 min in HBS-T buffer containing 2.7 mM KCl. Reduced Fc region was prepared by treating the purified Fc region with 50 mM 2-mercaptoethylamine (Pierce) at a temperature of 310 K for 90 min. For SPR analysis, 3000 RU of AF.2A1 were immobilized on a sensor chip CM5. Obtained data were processed by using Biacore T100 evaluation software (GE Healthcare).
Pulldown Assay-0.5 mg of AF.2A1 was immobilized on 100 l of N-hydroxysuccinimide-activated Sepharose TM HP (GE Healthcare) in accordance with the manufacturer's protocol.
Crude cell lysate was prepared by sonicating a 1-ml culture of Escherichia coli BLT 5403 strain in TBS-T. Acid-treated Fc region was prepared by dialyzing the Fc region against 10 mM Gly-HCl, 150 mM NaCl (pH 2.0) for 12 h. The crude cell lysate with or without the acid-treated Fc region was mixed with AF.2A1-immobilized Sepharose. After washing five times with TBS-T, the bound Fc region was eluted with 50 mM NaOH. The purities of samples were evaluated by SDS-PAGE. Determination of Molecular Weight in Solution by Analytical Ultracentrifugation (AUC)-For AUC, sedimentation equilibrium data were obtained on a ProteomeLab XL-A (Beckman Coulter). AF.2A1 was dissolved in 10 mM sodium acetate-d 4 buffer (pH 4.5). Plotted data are for 72 M AF.2A1 at rotor speed of 40,000 rpm. The experimental molecular weight was calculated from the fitted slope. The monodispersity of AF.2A1 was also confirmed by size exclusion chromatography (SEC). SEC was performed with an AKTA purifier (GE Healthcare) using Superdex Peptide 10/300 GL (GE Healthcare). AF.2A1 was dissolved in 10 mM sodium acetate-d 4 buffer (pH 4.5) and injected onto the column.
NMR Analysis-NMR spectra were obtained on a Bruker AMX 500 (500.13 MHz for 1 H). Synthesized peptides were dissolved at 1.8 mM in 10 mM sodium acetate-d 4 buffer (pH 4.5) containing 5% (v/v) D 2 O and 0.01% sodium 2,2-dimethyl-2silapentane-5-sulfonate. Homonuclear two-dimensional spectra (DQF-COSY, NOESY, and TOCSY with water suppression) were performed at 283 K. The mixing time was 200 ms for NOESY, and 50 ms for TOCSY. 3 JN␣ coupling values were obtained from cross-peaks in DQF-COSY spectrum. Resonance assignments were carried out with the standard sequential procedure and by using Sparky (22). An amide hydrogendeuterium (H/D) exchange experiment was performed at 283 K in 10 mM sodium acetate-d 4 buffer (pH 4.5) containing 0.01% sodium 2,2-dimethyl-2-silapentane-5-sulfonate, and the decay of the intensity was monitored in one-dimensional spectra from 4 to 60 min. Protection factors were determined by dividing the observed exchange rates by the intrinsic rates, which were calculated by the method of Bai et al. (23). The cross-peaks in the NOESY spectrum were divided into four classes according to their volumes, which corresponded to interproton distance restraints of 1.5-2.8, 1.5-3.5, 2.0 -4.5, and 2.5-6.0 Å, respectively. Torsion angle constraints for were determined from 3 JN␣. They were classified into three categories as follows: Ϫ65°Ϯ 35°, Ϫ100°Ϯ 70°, Ϫ120°Ϯ 50°corresponding to 3 JN␣ Ͻ6.5 and 6.5-8.0, and Ͼ8.0 Hz, respectively. Simulated annealing calculations were carried out using CNS (24). The distance restraint for a hydrogen bond was implemented in the final calculation. Finally, 20 structures having no distance violation exceeding 0.2 Å and no torsion angle violation beyond 3°were obtained out of 50 initial structures generated in random conformations.
Fluorescence Analysis-Tryptophan intrinsic fluorescence spectra were measured at 283 K with a Jasco FP-6500 spectrofluorometer (Jasco). Each sample was dissolved at 10 M in 10 mM sodium acetate-d 4 buffer (pH 4.5). Emission spectra were recorded between 300 and 400 nm with an excitation wavelength of 295 nm.
Competitive Selection from a Spiked Library-A control library that consisted of polypeptides with an unstructured segment was constructed by replacing chignolin segment of the second generation library (Table 1) with a glycine-rich sequence (Gly 3 -Ser-Gly 4 ). Here, the control library and conventional chignolin-based library are termed "Gly library" and "CLN library," respectively. The Gly library and CLN library were mixed with a ratio of Gly library/CLN library ϭ 10:1 to prepare a spiked library. Biopanning selection from the spiked library against the Fc region was performed in accordance with the same procedure described above. 48 clones were randomly isolated from each group (the spiked library and polyclonal population after each round of biopanning selection). Amino acid sequences of the isolated clones were determined three separate times. The ratio of clones from CLN library and the ratio of unique amino acid sequences were calculated.

Functional Selection of Affinity Clones from Segment-elongated Protein Libraries-
The designed protein consists of three segments as follows: a foldable chignolin-derived segment (hereafter referred to as CLN) and two elongated segments fused with the N and C termini of CLN (Fig. 1). CLN is an eight-residue segment (Tyr-Asp-Pro-Xaa-Thr-Gly-Thr-Trp) obtained by truncating the N-and C-terminal glycine residues of chignolin and introducing a randomized amino acid at the middle position where the original glutamic acid residue has been found not to contribute to the formation of the chignolin structure (12). We performed the segment elongation in two steps. The first library was generated by elongating a segment containing an eight-residue randomized region and a glycine The center inset shows the three-dimensional structure of chignolin (Protein Data Bank code 1UAO). Eight-residue randomized regions Xaa 8 were elongated from the chignolin-derived segment (termed CLN) in two steps. In the first generation library (1st gen. C-terminal lib.), Xaa 8 was elongated from the C terminus of CLN and displayed on T7 phage. After affinity selection to a target, a second generation library (2nd gen. N-terminal lib.) was generated in which Xaa 8 was elongated from the N terminus of an amino acid sequence selected from the first generation library.

Structurally Guided Stepwise Segment Elongation
residue Xaa 8 -Gly at the C terminus of CLN, then followed by biopanning selection against a target molecule. A selected clone from the first generation library was used for the second generation library. The second generation library was prepared by elongating Xaa 8 at the N terminus of the selected clone.
We carried out biopanning selection against the human IgG Fc region used as a model target. The first generation library of 5 ϫ 10 6 clones was displayed on T7 phage. The fourth round of selection enriched clones with distinctive sequences (Table 1). After ELISA screening, the most converged clone (17/21 clones) termed H6 was isolated. We then elongated its N-terminal segment to evolve the H6 into a higher affinity binder. The fifth round of biopanning selection from the second generation library of 2 ϫ 10 6 clones successfully enriched a converged clone termed 2A1 (18/30 clones) ( Table 1).
Sequential comparisons of the selected clones revealed notable convergence of amino acid sequences. The first generation selection clearly resulted in convergence of Arg-17 and Ser-18 and hydrophobic residues (Ile or Leu) at position 20, whereas positions from 21 to 24 diverged. The second generation selection resulted in the convergence of Val-3, Arg-5, Trp-6, Ser-7, and Gly-8, although the most converged clone 2A1 did not have an arginine residue at position 5. Most of these converged positions were located in the vicinity of CLN in the sequence. No sequences similar to the converged sequence were identified in BLAST searches.
Functional Analysis of an Evolved Protein Having the Converged Sequence-We analyzed the binding ability of a 25-residue protein, AF.2A1, possessing the converged sequence of 2A1, by using SPR. Chemically synthesized AF.2A1 showed high affinity to the IgG Fc region with an equilibrium dissociation constant (K D ) of 2.2 Ϯ 0.2 ϫ 10 Ϫ8 (M). The affinity of AF.H6, a 17-residue peptide corresponding to the clone H6, a selected clone from the first generation library, was determined to be K D ϭ 7.9 Ϯ 2.8 ϫ 10 Ϫ5 (M), showing that the second segment elongation enhanced the affinity by more than 3 orders of magnitude.
Detailed functional analysis revealed the exquisite molecular recognition ability of AF.2A1. Interestingly and unexpectedly, AF.2A1 bound to the Fc region that we purchased from a manufacturer but not to any monoclonal IgGs, including a commercially available therapeutic antibody, nor to the purified Fc region that we prepared from monoclonal IgG (Fig. 2, a-c). We then tested whether or not AF.2A1 recognized a panel of IgG and Fc regions treated under various conditions as follows: slightly acidic conditions (pH 4.0), heat-treated conditions, and reducing conditions (Fig. 2, d-f). AF.2A1 bound to all of those treated IgG and Fc regions. These results imply that AF.2A1 did not recognize natively formed IgG nor the Fc region, but rather non-native conformers caused upon acidification or heat treatment (25)(26)(27). Despite such a unique binding mode, the recognition ability of AF.2A1 was highly specific enough to distinguish the Fc region from crude cell lysate (Fig. 2g). Thermodynamic analysis revealed that binding of AF.2A1 to the acid-treated Fc region was exothermic; the van't Hoff enthalpy was determined to be Ϫ33 Ϯ 5 kJ/mol (Fig. 2h). This enthalpy-driven binding also suggested specific interaction rather than nonspecific hydrophobic adsorption.
Structural Analysis of AF.2A1-To gain structural information on the segment-elongated protein, we determined the three-dimensional structure of AF.2A1 by 1 H NMR. AF.2A1 is soluble in aqueous solution and exists as a monomer as determined by AUC and SEC (Fig. 3), resulting in dispersed crosspeaks in the fingerprint region (Fig. 4a). NOESY spectra characteristic of chignolin, such as NOE peaks between spatially neighboring Tyr-9 and Trp-16, were observed (Fig. 4b). H/D exchange measurements revealed that the NHs of Asp-10 and Gly-14 were considerably protected (protection factor ϭ k int / k obs ϭ 13.4 and 27.3, respectively). Four hydrogen bond constraints involving these two NHs were employed in the structure calculation. A family of 20 structures was chosen on the basis of low energies (Fig. 5a). The structures possessed idealistic stereochemical properties and showed root mean square deviation for the backbone of 0.91 Ϯ 0.25 Å ( Table 2). The three-dimensional structure of AF.2A1 revealed that the chignolin-derived ␤-hairpin was retained after fusion with the two elongated segments. The ␤-hairpin of AF.2A1 was stabilized by the edge-to-faceinteraction between Tyr-9 and Trp-16, which was derived from chignolin (Fig. 5b). Being positioned via the ␤-hairpin structure of CLN, the two elongated segments interacted with each other. This interaction made the overall structure of AF.2A1 globular and not extended.
Each elongated segment adopted a structurally different conformation (Fig. 5a). The N-terminal segment formed a relatively extended structure and associated mostly with the C-terminal segment. In contrast, the C-terminal segment formed a compact structure and associated with both CLN and the N-terminal segment. This difference was quantitatively illustrated by a contact map (Fig. 6). The inter-residue contacts between the C-terminal segment and CLN were twice as large as those between the N-terminal segment and CLN.
AF.2A1 formed a hydrophobic core within its structure as seen in globular proteins. Trp-16 was surrounded by the side  FEBRUARY 7, 2014 • VOLUME 289 • NUMBER 6

JOURNAL OF BIOLOGICAL CHEMISTRY 3397
chains of Ile-20, Tyr-9, and Gln-5. Most of the Trp-16 side chain was buried (Fig. 7a), serving as the core residue. To survey the tryptophan environment, we measured the tryptophan intrinsic fluorescence spectra of AF.2A1, as well as fragmented AF.2A1, which was an equimolar mixture of the N-and C-terminal elongated segments and CLN (Fig. 7b). The fluorescent peak of AF.2A1 was 7.5-nm blue-shifted compared with those of fragmented AF.2A1 and chignolin, thus strongly indicating the burial of Trp-16 into the interior of the protein. The results of the fluorescence experiment that implied the formation of the hydrophobic core were highly consistent with the structural characteristics given by NMR.
Mutation Studies-The structural analysis demonstrated that CLN guides the overall structure of AF.2A1. To evaluate the effect of the structural support of CLN on the function, we assayed the binding of a protein variant termed AF.2A1Gly in which the CLN sequence was substituted into a flexible glycinerich sequence of Gly 3 -Ser-Gly 4 . The substitution resulted in a significant decrease of the affinity by more than 3 orders of magnitude; the K D of AF.2A1Gly was determined to be 2.3 Ϯ    FEBRUARY 7, 2014 • VOLUME 289 • NUMBER 6 1.9 ϫ 10 Ϫ5 (M). Cleavage of AF.2A1 into three segments, i.e. the fragment mixture of CLN and the two elongated segments, resulted in almost complete loss of the binding function, and a meaningful affinity constant could not be quantified. This indicated that the CLN segment itself was not involved in the binding directly. Thus, the affinity loss observed in AF.2A1Gly was attributed to the lack of the CLN structure.

Structurally Guided Stepwise Segment Elongation
Sequential comparisons of selected clones implied that converged amino acid residues played an important role in the function. We investigated residues important for the binding by alanine scanning mutagenesis. Of the converged residues (Fig.  8a), the side chains of Val-3 and Ile-20 were buried in the protein, and those of five residues (Trp-6, Ser-7, Arg-17, Ser-18, and Ser-19) were exposed to the solvent. Alanine scanning mutagenesis showed that four of the five solvent-exposed resi-   dues were important in the binding (Table 3). Sequentially or structurally, these residues were located in the vicinity of CLN and clustered at a particular area of the molecular surface as generally seen in protein-protein interaction hot spots (Fig. 8b) (28). The alanine substitution at position 20, where hydrophobic residues (Ile or Leu) converged and formed the hydrophobic core with Trp-16, caused a 40-fold reduction in the affinity. This indicates the importance of the stable structure of AF.2A1 on its function. In contrast, the point mutant Q5R enhanced the affinity by a factor of 2.5 ( Table 3). The sequential comparison showed an arginine residue converged at position 5, and the clone 2A1 coded a glutamine residue at this position. Structurally, this glutamine residue was in contact with Trp-16 (Fig. 7a), implying that the contribution of Gln-5 to structural stabilization is lower than that of the converged arginine.
Contribution of a Structural Support to Functional Evolution-To confirm the contribution of CLN to functional evolution, we performed competitive selection by using a spiked library that consisted of a conventional CLN-based random library (termed CLN library) and a control library (termed Gly library) in which CLN was replaced with a glycine-rich unstructured motif (Gly 3 -Ser-Gly 4 ). The initial population size of the CLN library was set to 10% of the total. We performed seven rounds of biopanning selection from the spiked library against the Fc region. Sequence analysis of clones isolated from each round of biopanned population demonstrated the enrichment of CLN library-derived clones along with rounds of biopanning selections (Fig. 9). After the seventh biopanning selection, the ratio of unique sequences was more than 70%, indicating that not a specific clone converged after the selection. These results strongly indicate the potential of a structural support such as CLN to promote functional evolution of the population.

DISCUSSION
Evolutionary Significance of Introducing a Foldable Peptide Segment-AF.2A1 showed characteristic features of a globular protein-like structure and function, rather than those of a flexible polypeptide, although its amino acid sequence is far different from that of any native protein motif. Functional, structural, and evolutionary analysis on the basis of SPR, NMR, and competitive selection strongly suggests that the foldable chignolin, even if unrelated to the function itself, can contribute to promoting evolution. CLN played a major role in the structural organization of AF.2A1. Formation of the overall structure was mainly assisted by the chignolin-derived ␤-hairpin, thereby allowing the two elongated segments to be positioned closely to each other. AF.2A1 formed a hydrophobic core in which Trp-16 was almost entirely shielded from the solvent, as confirmed by tryptophan intrinsic fluorescence. The core needs to be supported by surrounding residues (Fig. 7a). For instance, alanine substitution of isoleucine at position 20 significantly reduced the protein's binding affinity (Table 3). This corresponds to the convergence of aliphatic residues at position 20. The hydrophobic core containing Trp-16 was stabilized at least by the chignolin-derivedinteraction with Tyr-9 (Fig. 5b). This stabilization probably made Trp-16 suitable for serving as a Equimolar mixture of CLN and the N-and C-terminal segments. b A meaningful value was not determined because of too low an affinity. FEBRUARY 7, 2014 • VOLUME 289 • NUMBER 6 the core residue. These observations demonstrated that the structural support of CLN promoted structural organization in both the main chain and side chain conformations.

Structurally Guided Stepwise Segment Elongation
The first elongated C-terminal segment and the second elongated N-terminal segment differ in their local conformations, indicating that two distinct events occurred during each segment elongation. Although the N-terminal segment is in a relatively extended form, the C-terminal segment forms a compact structure by contacting CLN, which is guided by hydrophobic association with Trp-16 (Fig. 7). The importance of this local compact structure was indicated by the observation that fragmented AF.2A1 almost lost its binding ability. This implies that, in the first segment elongation, the C-terminal local structure transiently formed to gain minimal function, as shown in the affinity of AF.H6 (Table 3). In the second elongation, the N-terminal segment supported the conformation of the C-terminal segment, resulting in the formation of an overall structure and in functional enhancement. This suggests an evolutionary scenario in which a transient local structure with a minimal function emerged in an early evolutionary event, followed by later events that stabilized the first emerged local structure so as to achieve a higher structural organization with enhanced function.
Note that the clone 2A1 was selected not by its structural stability but by binding affinity. This suggests that functional evolution could promote structural organization. Similarly, in earlier work, functional selection from random libraries yielded folded proteins stabilized by a zinc finger motif (29,30), although it is also reported that a functional selection gave rise to a high affinity binder whose structural organization was restricted to a part of overall structure (31). Structural organization through functional selection probably depends on selection pressure where structural restraints enhance function. Under such circumstances, structure was organized by functional selection, and function was enhanced by structural organization. A similar evolutionary process was illustrated by an in silico study (32) where fixation of inter-residue contacts around an active site in early evolution gradually developed into overall structure formation. Considering the enormous diversities of sequential and conformational spaces of a protein, it is quite reasonable to deduce such a stepwise cyclic relationship between the structural organization and functional enhancement.
Chignolin-derived structural support was involved in the gain-of-function. The amino acid convergence after functional selection mainly occurred at positions sequentially or structurally in the vicinity of CLN (e.g. Val-3, Trp-6, Ser-7, Gly-8, Arg-17, and Ser-18). Most of the converged residues with solventexposed side chains were located on the same side of the surface (Fig. 8b) and critical to the binding function as revealed by the mutation study. These residues composed a hot spot region as seen in native protein-protein interactions. Positions in the vicinity of CLN were the most structurally stabilized in the elongated segments and were probably more suitable for the generation of an active site by minimizing unfavorable entropic costs. Tryptophan and arginine are frequently occurring hot spot residues in protein-protein interactions as statistically deduced (28). The features of the AF.2A1 functional region strongly correspond to the typical properties of proteinprotein interactions. Therefore, it is plausible that introducing a structural support into a polypeptide could promote gain-offunction in protein evolution as well as structural organization.
The above observation that CLN affected both the structure and function led us to propose an evolutionary model in which a foldable segment serves as the core for both structural organization and gain-of-function, and thereby promotes primordial protein evolution. Indeed, the result of the competitive selection strongly suggested that CLN promoted functional evolution (Fig. 9). The contribution of CLN to molecular function was also confirmed by a mutation study that showed that substitution of CLN to a glycine-rich linker caused significant affinity reduction by 3 orders of magnitude (Table 3). In light of our functional, structural, and evolutionary observation, we propose the term "evolving core" for the foldable segment in primordial protein evolution. Several protein folding studies have revealed that a partially folded segment in a protein, described as the "folding core" (33)(34)(35)(36)(37), promotes the folding process probably by restricting the conformational space accessible to a polypeptide. Analogously, in our model, evolving core promotes evolution by giving a partially stabilized structure to a polypeptide. It has been reported that the recombination of 30 -40-residue polypeptide segments produces a novel folded protein in which each segment retains its own structural features and acts as a template for the fold (38). Our results suggest that similar events occur in the evolution of much smaller and more primitive proteins.
Structural stabilization is key to promoting evolution as has been supported by simulation studies (39) and experimental observations (40). De novo generated proteins that have been reported also utilize structural supports for stabilization (29), leading to a general view that metal coordination plays a major role in primordial protein evolution (9). This view is persuasive because a relatively short peptide could also form a metal coordination motif. Meanwhile, some short peptide segments can also fold into their own structures (12,13,41,42). As demonstrated here, introducing a foldable segment with the size of a local structure of a modern protein can give rise to a structured protein as a result of functional selection. The stabilization was attributed to secondary structural formation of a ␤-hairpin and hydrophobic core formation through the tryptophan residue. It should be noted that tryptophan-mediated hydrophobic cores are found in other artificial or naturally occurring small sized proteins such as the Trp cage (43) or Pin WW domain (44); thus, it is reasonable to assume that such hydrophobic cores formed in early protein evolution. We conclude that a foldable peptide segment possesses great potential to promote primordial protein evolution.
Molecular Design of Small Sized Proteins for Practical Applications-We should emphasize the advantage of structurally guided segment elongation from a practical viewpoint. To date, the protein engineering technique has developed and utilized small structural motifs as a scaffold such as zinc finger motif (7), tryptophan zipper (31,45), Trp cage motif (46), and disulfide-cyclized ␤-hairpin (47) to generate a wide variety of functional mini-proteins of tens of residues. By comparison with those scaffold-based approaches, our strategy is based on an evolutionary concept of "structurally guided stepwise segment elongation" that has a characteristic that an artificial protein evolves from a short flexible peptide to a more structured functional protein in a stepwise manner with the aid of a structural support. Similar evolutionary mechanisms have been reported in enzyme evolution (48) and antibody affinity maturation (49) where a flexible molecule with cross-reactivity or weak activity evolves toward a more specific and higher activity protein along with entropic reduction. Those studies have proposed that structural flexibility in the early stage could increase structural and thereby functional diversity. In light of these preceding studies, we can expect a similar mechanism in structurally guided stepwise segment elongations.
The use of the 10-residue chignolin has the potential to give rise to small-sized proteins useful for industrial and pharmaceutical applications. AF.2A1 showed high affinity and specificity without depending on any additional stabilization such as chemical cross-linking (50), protein splicing (51), or metal coordination (7,9). Because of not relying on a disulfide bond, AF.2A1 was capable of functioning under reducing conditions. This characteristic would expand the range of use, including a specific binder that functions in cytosol. This small sized protein exhibited resistance against misfolding and/or aggregation even after being boiled at 373 K (Fig. 10). Such a small sized protein can be a promising alternative to high molecular weight proteinaceous binders. Molecular design concepts based on theories of primordial protein evolution will meet the growing demand for low molecular weight binders with superior thermal and storage stabilities and low manufacturing costs.