Circular Permutation Directs Orthogonal Assembly in Complex Collagen Peptide Mixtures*

Background: Multiple types of collagens specifically assemble and co-exist as a major component in extracellular matrix. Results: Orthogonal assembly of two collagen heterotrimers was induced by circularly permuting the sequences of a previously designed heterotrimer. Conclusion: High collagen assembly specificity can be achieved with a simple sequence rearrangement. Significance: This study provides insights on how fibrillar collagens might have arisen from gene duplication and domain shuffling. Multiple types of natural collagens specifically assemble and co-exist in the extracellular matrix. Although noncollagenous trimerization domains facilitate the folding of triple-helical regions, it is intriguing to ask whether collagen sequences are also capable of controlling heterospecific association. In this study, we designed a model system mimicking simultaneous specific assembly of two collagen heterotrimers using a genetically inspired operation, circular permutation. Previously, surface charge-pair interactions were optimized on three collagen peptides to promote the formation of an abc-type heterotrimer. Circular permutation of these sequences retained networks of stabilizing interactions, preserving both triple-helical structure and heterospecificity of assembly. Combining original peptides A, B, and C and permuted peptides D, E, and F resulted primarily in formation of A:B:C and D:E:F, a heterospecificity of 2 of 56 possible stoichiometries. This degree of specificity in collagen molecular recognition is unprecedented in natural or synthetic collagens. Analysis of natural collagen sequences indicates low similarity between the neighboring exons. Combining the synthetic collagen model and bioinformatic analysis provides insight on how fibrillar collagens might have arisen from the duplication of smaller domains.

Collagen is a family of fibril-forming proteins with a highly consistent sequence pattern at helical regions, where sequences are repeats of Xaa-Yaa-Gly triplets. Genetic analyses suggest that collagen genes may originate from a common ancestor and expand by gene duplication and exon shuffling (1)(2)(3). However, different types of natural collagens discriminate each other well, specifically assembling in the extracellular matrix (4). For example, type I and III are co-localized in dermis (5). Type I and V co-assemble into heterotypic fiber in corneal stroma (6). Isoforms of type V/XI co-exist in cartilage, where the composition varies with ages (7). Although noncollagenous trimerization domains facilitate the folding of triple-helical regions (8 -10), it is intriguing to ask whether collagen sequences also control this degree of heterospecific association. Design of synthetic collagens that recapitulate this natural specificity would be a powerful tool to investigate the relationship between primary sequence and folding heterospecificity.
Protein engineering strategies are often inspired by molecular processes driving natural protein evolution (11). In circular permutation, the order of amino acids is altered without significantly changing the three-dimensional fold. Conceptually, the termini of a protein are joined, resulting in a circular intermediate that is then broken at another location, giving new start and end positions. If successful, the folds of the original and circularly permuted proteins are similar with the exception of the new termini and the introduced loop that fuses the original termini (Fig. 1a). Circular permutation has been used in protein engineering to probe the location of structural intermediates in folding and to enhance protein stability (12)(13)(14)(15). The success of circular permutation as an engineering strategy hinges on the close spatial proximity of the N and C termini in most globular proteins (16), making it possible to fuse termini without the introduction of long, unwieldy linkers.
The N and C termini for fibrous proteins such as coiled-coils or collagen are far apart. The effect of circular permutation on such proteins has not been studied to the same extent as globular proteins. Due to the linearity of fibrous protein structure, circular permutation would result in three-dimensional domain swapping, where a structural superposition of the fold would align different portions of the sequence. This may have functional implications, such as altering interactions with other proteins or self-assembly into higher order structures.
Highly specific abc-type collagen heterotrimers have been previously designed using computational sequence patterning (17,18). We examine the effect of circular permutation on a previous computationally designed collagen peptide abc-type heterotrimer (17). The sequences of peptides A, B, and C were patterned such that they self-assembled into a stable triple helix only when all three were combined with equimolar stoichiometry. This is a specificity of one often possible trimeric states. The heterotrimer was stabilized by acidic and basic amino acids that formed an extensive network of surface charge-pair interactions. Due to the linear structure of collagen, these electrostatic interactions are local in both sequence and structure. We hypothesize that circular permutation of A, B, and C sequences will have minimal impact on these networks, resulting in a peptide system D, E, and F that maintains the structure and heterospecificity of the original design.
The sequence transformation of A:B:C to D:E:F swaps the Nand C-terminal domains, potentially affecting interactions with other proteins. It is interesting to consider the behavior of mixtures containing all six peptides. At least two outcomes may be considered (Fig. 1b): (i) domain-swapping drives fiber formation through assembly of sticky-ended precursors, or (ii) A:B:C and D:E:F assembling orthogonally, a stoichiometric specificity of 2 of 56 possible combinations of the six peptides. Circular permutation may either prevent protein-protein interactions between A, B, C and D, E, F, or it may promote supramolecular assembly. There are precedents for heteromeric peptide stickyended assembly; collagen fibril formation was promoted by engineering interchain disulfides that stabilized a sticky-ended precursor (20). Fiber-forming heterodimeric ␣-helix peptide coiled-coils were designed to assemble in a sticky-ended fashion, held together by internal hydrogen bonds and surface charge-pair interactions (21). In that example, circular permutation was used to generate a third peptide capable of forming a stable blunt-ended species that blocked fibril formation. Whether the designed collagen peptides proposed here form sticky-ended precursors en route to fibers, or orthogonally assemble as two abc-type heterotrimers will depend on the relative stability of target and competing triple-helical states.

MATERIALS AND METHODS
Circular Dichroism-Circular dichroism experiments were performed on an AVIV model 400 spectrophotometer with optically matched 0.1-cm path length quartz cuvettes (model 110-OS; Hellma). All of the samples were prepared as 0.2 mM total concentration in pH 7, 10 mM phosphate buffer with or without 100 mM NaCl as specified in figure legends. Mixtures were heated at 50°C for 15 min and incubated at 4°C for ϳ48 h. Wavelength scans were conducted from 190 to 260 nm at 4°C (number of scans 1, averaging time 3.0 s). Values are reported as mean residue ellipticity (MRE), 2 correcting for concentration, residue number, and cell path length.
For temperature denaturation experiments from 0 to 60°C, the ellipticity was monitored at 223 nm on the same instrument at the heating rate, 0.33°C/step, with 2-min equilibration time. The folded fraction was calculated as where (T) was the observed ellipticity at a temperature T, and F (T) and U (T) were ellipticity estimated from a linear fitting of folded and unfolded base lines. The apparent melting temperatures, T m , were estimated as F(T).
Before the assay mixtures AϩBϩC*ϩDϩEϩF and AϩBϩCϩ DϩEϩF*, of 0.4 mM total concentration were prepared in pH 7, 10 mM phosphate buffer with 100 mM NaCl. They were incubated at 50°C for 15 min and then at 4°C for ϳ48 h to induce triple-helix formation. Samples were incubated with streptavidin-coated magnetic beads (Dynabeads MyOne Streptavidin T1; Invitrogen) at 4°C for 4 h to bind the biotin-tagged peptides on the beads. Beads were washed with 1 ml of washing buffer (pH 7, 10 mM phosphate buffer with 100 mM NaCl) three times to eliminate nonspecific interactions. The beads were suspended in 75 l of washing buffer and incubated at 60°C for 15 min to denature triple helices, from which the supernatants containing the non-biotin-tagged peptides were collected. The beads were incubated in 6 M guanidine HCl at 96°C for 15 min to release the biotin-tagged peptides. Supernatants from these two elution processes were combined and characterized by HPLC and mass spectrometry.
HPLC and Mass Spectrometry-For the mixtures before the streptavidin-biotin binding assay, each peptide of 250 nM, e.g. 3.75 l of the mixtures with 0.4 mM total concentration, were prepared in 0.2% trifluoroacetic acid (TFA) and loaded on to a pre-packed Poroshell 120, EC-C18, 2.1 ϫ 100 mm, 2.7-m column (Agilent Technology) in 0.2% TFA/water as the mobile phase A and eluted with a linear gradient of 2-25% of 0.2% TFA 2 The abbreviation used is: MRE, mean residue ellipticity. in acetonitrile (phase B) in 40 min. HPLC profiles were monitored as UV absorbance at 214 nm. Due to identical molecular weights, original and permuted sequences had similar HPLC retention. To identify species in peak fractions from retention time from 16 to 22 min, MS and tandem MS at Reflector mode experiments were carried on an ABI-MDS SCIEX 4800. The peak area was integrated with Gilson Unipoint System software.
Stability Scores-The details of stability score calculation have been described previously (17,22). Briefly, a discrete sequence-based model was adopted from an ␣-helical coiledcoil design (23); the stability score counted electrostatic interactions between of proximal residues on adjacent chains, plus an imino acid (ProϩHyp) content term. The best stability score among all the possible staggering registries was selected to represent a particular stoichiometry.
Sequence Similarity-Exon sequences of collagen types I, III, and V were downloaded from the ENSEMBL genome browser (sequence ID codes ENST00000225964, ENST00000304636, ENST00000371817, ENST00000374866, ENST00000264828, and ENST00000297268). A few large exons consisting of multiples of 15-or 18-residue-long fragments were subdivided into smaller domains. Amino acids were categorized by physiochemical properties, i.e. polar, hydrophobic, positively charged, negatively charged. The amino acids in the same category were considered to be identical. Multiple alignments were generated for fragment pairs of different lengths, among which the highest similarity score was selected.

RESULTS AND DISCUSSION
Sequence Design-To create a triple helix a few key design constraints must be considered. Collagen consists of repeated Xaa-Yaa-Gly triplets. In natural collagens these can extend for Ͼ1000 amino acids, but in the model peptides only 10 repeats are used resulting in 30-residue-long chains. A network of interchain backbone hydrogen bonds maintains the triple-helical fold, and additional stability and specificity are modulated by the identities of solvent exposed residues. Xaa and Yaa positions are often proline or hydroxyproline, respectively, which stabilize the extended polyproline-II conformation of the triple-helical backbone. Charged amino acids may also be placed at these positions to promote electrostatic interactions that modulate stability and specificity (24 -27). Previously, three collagen peptide sequences A, B, and C were designed to form an abc-type heterotrimer A:B:C to the exclusion of competing stable homotrimers and two-species heterotrimers (i.e. A:2B, 2B:C, etc.) (17). This was achieved using a simulated evolution protocol (28) that optimized the network of favorable charge pair interactions in the target ABC state while maintaining an energy gap between target and competing states.
Circular permutation of A, B, and C should maintain the majority of favorable charge-pair interactions (Fig. 2a). Three new sequences, D, E, and F, were generated by circular permutation of A, B, and C: D ϭ p 5 (A), E ϭ p 5 (B), and F ϭ p 5 (C) where the index x of p x () refers to the number of triplets moved as a block from the C terminus to the N terminus: A, PKGPKGPKGKOGPDGDOGDOGDOGPKGPKG; B, PDGDOGDOGDOGPDGKOGPDGPDGPDGDOG; The resulting D:E:F target was predicted to be stable, maintaining all except two of the interactions modeled in A:B:C. Additionally, a significant energy gap was calculated between D:E:F and competing states (Fig. 2b). It was expected that D:E:F would retain structure and heterospecificity.
Circular Permutation Preserves Structure and Heterospecificity-To test the computational design, the structure and thermal stability of the original and circularly permuted heterotrimers were assayed by circular dichroism. Ten peptide mixtures at equivalent total peptide concentrations were pre-  (Fig. 2, e and f). Other mixtures showed little or no triple-helical structure, indicating that the heterospecificity was preserved by circular permutation. Additionally, only A:B:C and D:E:F mixtures produced species with significant stabilities and cooperative unfolding transitions (Fig. 2, c and d). The melting temperature, T m , of A:B:C at 29°C was slightly greater than D:E:F, T m ϭ 24°C, consistent with the loss of two charge pairs upon circular permutation. Whereas BϩC mixtures did form weak triple helices with borderline cooperative unfolding transitions of T m Ͻ 5°C, equivalent species were not found in EϩF mixtures.
Assessing Formation of Sticky-ended Precursors-Combining all six peptides from the two groups could lead to the creation of sticky-ended precursors that further assemble into fibers. Two peptides from a group plus the circular permutation of the third peptide could form a sticky-ended precursor with an overlap of 15 residues as pictured in Fig. 1b. There are six such combinations: A:B:F, A:E:C, D:B:C, D:E:C, D:B:F, and A:E:F. When examined by circular dichroism spectroscopy, none had a positive ellipticity at 223-nm diagnostic of triple helix, nor did any exhibit cooperative loss of structure upon thermal denaturation (supplemental Table S1). Dynamic light scattering measurements showed no evidence of higher order assembly. Single particle size distribution peaks at approximately 2 nm were consistent with the observed hydrodynamic radius of triple helix and random-coil states in previous studies (supplemental Fig. S1) (29). Negligible signals were found for larger particle sizes. Solutions remained clear and did not form precipitates even after storage for long periods of time. In all, no evidence was found to support that the existence of sticky-ended precursors or higher order structures under the conditions studied. An insufficient number of backbone-backbone hydrogen bonding interactions in the offset species might account for the absence of such intermediates.
Assessing Orthogonal Assembly-The term "orthogonal assembly" is used to describe the spontaneous formation of more than one abc-type heterotrimer where each is composed of a unique set of peptides. As such, orthogonal assembly would entail the folding of the six peptides AϩBϩCϩDϩEϩF into A:B:C and D:E:F. This requires that these species are signifi-cantly more stable than competing mixtures. There are 216 (613) possible triple-helical states that can be constructed from combinations of the six peptides. This corresponds to 56 stoichiometric combinations compatible with a collagen trimer.
Given the subset of amino acids used in these designs, it is possible to score the stability of each state a weighted sum of the number of imino acids (Pro ϩ Hyp) and the total predicted charge-pair interactions between structurally adjacent residues. Despite the simplicity of this metric, it is reasonably predictive of observed stabilities (22,30). Computed stabilities from all 216 states were assigned to the 56 stoichiometries based on the best stability score among the set of triple-helical states associated with that mixture (supplemental Table S2). Target heterotrimers A:B:C and D:E:F had the highest stabilities both computationally and experimentally (Figs. 3b and 4a). The next best scoring stoichiometries were C:E:F and the corresponding circularly permuted state, p 5 (C:E:F) ϭ F:B:C. Both mixtures showed weaker triple-helical structures and stabilities with T m of 15°C and 11°C, respectively (Fig. 3b). Other states and their corresponding circular permutation with similar energies, A:B:F and D:E:C or A:B:D and D:E:A, did not form cooperative triple-helical structures. Although it is reasonable to expect that A:B:C and D:E:F orthogonally assemble in mixture given their observed stabilities, spectroscopic measurements alone are insufficient to demonstrate this. A pulldown assay was developed to observe the composition of triple-helical species formed in complex mixtures. Biotin separated by a Gly-Gly linker was conjugated to N termini of peptides C and F, referred to as C* and F*. Triplehelical structure and stability of A:B:C* were very similar to the unmodified A:B:C mixture, indicating that the modification did not perturb folding (supplemental Fig. S2). The same was found for D:E:F* compared with D:E:F. A:B:C*:D:E:F and A:B:C:D:E:F* mixtures were allowed to equilibrate overnight using a standard folding protocol. Streptavidin-coated magnetic beads were subsequently added to extract the biotinylated constructs and associated peptides. Peptides released from streptavidin-coated beads were identified by mass spectrometry and quantified from peak areas in the HPLC chromatograms.
The six peptides were initially combined in equimolar concentrations (Fig. 4, b and c). In both C* and F* mixtures, three major peptide peaks remained after the streptavidin bead pulldown. The post-C* pulldown peptides identified by tandem MS were A, B, and C* (supplemental Fig. S3). The ratios of peak areas of peptide A, B, and C* normalized to C* were 0.5:0.5:1 (supplemental Table S3), indicating C* predominantly bound to A and B, and half of the C* peptides were participating in heterotrimers. D, E, and F* were found in the same relative ratios in the post-F* pulldown, indicating that F* interacted specifically with D and E in the six-peptide mixture (Fig. 4c). These observations support orthogonal assembly of A:B:C and D:E:F in solution.
Detectable peaks corresponding to E and F were also pulled down with C*. The relative ratios of E and F to C* were 0.06: 0.05:1, suggesting the presence of a minor C:E:F species, consistent with the observed stability of C:E:F by circular dichroism spectroscopy. In a focused pulldown experiment where only three peptides were included in the initial mixture E, F, and C*, 5-fold greater amounts of both E and F were associated with C* compared with the six-peptide mixture (supplemental Fig. S4). This change in C:E:F fraction in the presence of the remaining three peptides is presumably due to competing formation of the more stable A:B:C and D:E:F species. In contrast, the B:C:F species which was shown to have marginal structure, and stability by circular dichroism was not found in a focused pulldown experiment consisting initially of B, C, and F*. The low stability of these competing unstable complexes would preclude them from forming at higher temperatures where A:B:C and D:E:F are present.
Limits of Circular Permutation-The orthogonal assembly of A:B:C and D:E:F was the result of the stability of target species and the instability of competing species, including sticky-ended precursors of higher order structure. Circular permutation serves to shift the network of electrostatic interactions so that they are out of phase with the optimal stagger of the triple helix. The five-triplet circular permutation is the largest perturbation possible in a 10-triplet peptide. We wanted to investigate the effect of smaller shifts, ranging from one to four triplets on triple helix formation. Would it be plausible to create a larger interactome where, for example, a three-triplet circular permutation would assemble orthogonally to the zero and five-triplet circular permutation sequences?
Four additional permutations of A were synthesized: p 1 (A),    Table S5). Consistent with the predicted stability scores, p 1 (A):B:C and p 2 (A):B:C formed stable heterotrimers with the stabilities lower than A:B:C (Fig. 5). Due to the inability to assign folded base lines, the melting temperatures of mixtures p 3 (A):B:C and p 4 (A):B:C were estimated to be Ͻ10 and 5°C, respectively. It was similarly challenging to assign a specific T m to p 1 (A):E:F. Experimental and calculated stabilities correlate in both the BϩC and EϩF backgrounds.
In these designs, a five-triplet permutation is optimal for preventing cross-assembly. Smaller permutations result in marginally stable species where stability correlates with the extent of permutation and where triple-helical structures are formed in both backgrounds. In the 10-triplet systems commonly studied, we would not expect a p x (A,B,C) ϩ p y (A,B,C) mixture to orthogonally assemble for x-y 5. This places a practical limit on the extent to which circular permutation can be utilized to direct orthogonal assembly in a multispecies collagen interactome. A nine-peptide orthogonally assembling interactome would not be generated by simply combining x ϭ 0,3,6 permutations of p x (A,B,C). Other approaches for diversifying the sequence would be needed to achieve specificity of molecular recognition in mixtures greater than six components.
Implications for Fibrous Collagen Evolution-It is interesting to consider whether the same sequence constraints promoting orthogonal assembly in synthetic collagen peptides might prevent misfolding in much longer fibrillar collagens. The specificity of A:B:C was achieved by intentionally incorporating an energy gap in the design protocol. This gap was preserved in D:E:F. However, gaps between targets and all other competing states in the six-peptide mixture were not explicitly modeled at the design stage. This unintentional gap arose from sequence diversity, the disparate charge patterns in the Nversus C-terminal halves of the designed sequences. This diversity comes from the stochastic nature of the simulated-evolution computational design protocol. Low sequence complexity or short periodic sequence motifs would be expected to increase the probability of undesired interactions between N-and C-terminal domains, resulting in a mixture of misfolded states. An equivalent phenomenon is found in naturally occurring multidomain proteins, where sequence diversity across individual domains is an important factor in preventing misfolding (31,32). In particular, adjacent subunits tend to have lower sequence identities than more distant pairs, as their spatial proximity makes misfolding more likely (33).
To examine the sequence diversity within collagen domains, it was first necessary to define the domains themselves. Unlike multidomain proteins such as fibronectin or titin, there are no clear structural boundaries in fibrillar collagens, which consist of a single, extended triple helix. In this case, a genetic definition of the domain based on exons is more useful. In fibrillar collagens, exons are predominantly 45 or 54 nucleotides in length, corresponding to 15 or 18 amino acid domains (34). This makes the fundamental domain in fibrillar collagen a fiveor six-triplet sequence; the synthetic peptides would consist of the equivalent of two domains.
An autocorrelation plot of sequence similarity versus the number of intervening domains was computed for several human fibrillar collagens. A peak in similarity is observed for domains separated by one intervening domain, i.e. domain n versus nϩ2 (Fig. 6). This supports a domain repeat expansion model where gene duplication occurred for pairs of exons, rather than one exon at a time (35). If a constraint on multidomain protein folding is the low sequence similarity of adjacent domains, the duplication of pairs of domains would be a mechanism to accomplish this. The difference in sequence similarity between domain n to itself (100%) and nϩ1 (ϳ37%) is large. With reference to domain n, the difference in similarity of nϩ2 FIGURE 5. Limits of circular permutation. For a series of circular permutations, p x (A) for x from 0 to 5, opposing stability trends are observed in the BϩC versus EϩF background. a, temperature melting profiles and b, circular dichroism spectra at 4°C are shown. Samples were prepared in phosphate buffer, pH 7, without NaCl. Due to the elimination of NaCl in buffer, MRE here was higher than in Fig. 3, where 100 mM NaCl was used. FIGURE 6. Sequence diversity and domain spacing in collagen. An autocorrelation plot of sequence similarity of exon n to nϩ1, nϩ2 … nϩ4 (red trace) shows a higher than average similarity of domains separated by one exon, and a lower than average similarity of adjacent domains (nϩ2 versus nϩ3). Sequence similarities were normalized to the average similarity of all exon-exon pairs. Various fibrillar collagen types are as labeled. and nϩ3 suggests that sequence diversity is maintained, perhaps for the reasons outlined above, to mitigate misfolding.
Despite the absence of structural domains within fibrillar collagen, sequence diversity between adjacent genetically specified domains is present and may be a mechanism to prevent misfolding. Although collagen registry is specified during procollagen folding by N-and C-terminal globular domains, the matrix is a dynamic structure that is constantly being remodeled (36), and such sequence diversity may minimize frustration caused by misannealed states during transient unfolding/refolding.

CONCLUSIONS
Circular permutation preserves three-dimensional structure in both globular and fibrous proteins. In our case, both the triple-helical fold and abc-type heterospecificity are maintained. Whereas circular permutations of globular proteins generally do not perturb intermolecular interactions (37,38), fibrous protein circular permutation swaps domains around the point of permutation, modifying intermolecular interactions and in this case promoting orthogonal assembly.
Success of circular permutation as a strategy for engineering orthogonal assembly hinges on the sequence properties of each peptide, particularly the presence of diverse, aperiodic charge patterns. Computationally designed collagen heterotrimers differ in this respect to rationally designed sequences, which often use repeating charge patterns through the entire peptide length (39,40). Our results suggest that it is not essential to design heterotrimers using 10-triplet sequences. Instead, libraries of five-triplet fragments can be combined to yield longer designs that preserve heterospecificity as long as sequence similarity of adjacent fragments is low. This reduces potential sequence space by 10 orders of magnitude, from 5 30 (ϳ10 21 ) to 5 15 (ϳ10 10 ) trimeric states, making it computationally feasible to pursue full sampling, expand the types of amino acids used, or to increase the level of theory used to evaluate designs.
The observed low sequence diversity of adjacent domains in natural collagens has implications for the design of long engineered collagens. The majority of biophysical analyses on collagen have focused on short peptides due to their synthetic accessibility. Recent studies on recombinant bacterial collagens demonstrate that it is possible to engineer much longer sequences and characterize their stability, folding kinetics, and other biophysical properties (41)(42)(43). A strategy for the engineering of long collagen sequences would be to use computational design to specify individual domains on the order of five to six triplets and then concatenate them in such a way as to minimize sequence identity of adjacent domains. This strategy may be extended to the design of other oligomeric fibrous proteins such as coiled-coils.
Six-component orthogonal assembly represents an unprecedented degree of molecular recognition specificity for collagen. Complex orthogonally assembling systems have been designed using ␣-helical folds (44 -46) and observed in natural proteinprotein interactions (19,47). One advantage of a triple-helical interaction network is the lack of structural overlap with existing interactomes. Due to the high proline content, collagenous peptides are unlikely to form other secondary structures.
Therefore, collagen peptides such as these expand the synthetic biology toolkit for directing protein-protein interaction networks or promoting self-assembly of complex structures.