The Cyclization and Polymerization of Bacterially Expressed Proteins Using Modified Self-splicing Inteins*

Mini-inteins derived fromSynechocystis sp. (Ssp DnaB intein) andMycobacterium xenopi (Mxe GyrA intein) that have been modified to cleave peptide bonds at their C and N termini, respectively, were cloned in-frame to the N and C termini of a target protein. Peptide bond cleavage of the modified inteins generated an N-terminal cysteine and a C-terminal thioester on the same protein. These complementary reactive groups underwent intra- or intermolecular condensation to generate circular or polymeric protein species with a new peptide bond at the site of ligation. Three cyclic peptides, BBP, an organ specific localization peptide; RGD, an inhibitor of platelet aggregation; and CDR-H3/C2, which inhibits HIV-1 replication, were isolated using the two-intein system. BBP, RGD, and CDR-H3/C2 had masses of 977.1, 1119.9, and 2098.6 g/mol, respectively, as determined by matrix-assisted laser desorption-time of flight mass spectrometry, which agreed well with the values of 977.2, 1120.3, and 2098.3 g/mol, respectively, predicted for the cyclic species. This system was used to cyclize proteins as large as 395 amino acids. Furthermore, multimers of thioredoxin were formed upon concentration of the reactive species, indicating the potential to form novel biomaterials based on fibrous proteins.

Protein splicing elements, termed inteins (1), catalyze their excision from a precursor protein with the concomitant fusion of the flanking protein regions (reviewed in Refs. [2][3][4][5]. Two peptide bonds are broken and a new peptide bond is formed during the protein splicing process. Inteins thus represent a potentially powerful means of protein manipulation. Controllable fission of the peptide bond at either the N or the C terminus of an intein has allowed the development of novel methodologies. Modified inteins have been used to isolate proteins with a thioester or thiocarboxylate on the C-terminal ␣-carbon (6 -11). Cysteine and synthetic peptides with an Nterminal cysteine were demonstrated to fuse to the thioestertagged protein through a native peptide bond (6,7,9,10) using chemistry described previously (12,13). This technology, termed intein-mediated protein ligation (IPL) 1 (7,14) or ex-pressed protein ligation (9,10), has been used to label proteins, to isolate cytotoxic proteins, and to investigate protein structure/function relationships (reviewed in Refs. 15 and 16). Further work has extended IPL to include the in vitro ligation of two bacterially expressed proteins using either an intein with controllable C-terminal cleavage activity (8,14,17) or proteolysis to generate a protein with the requisite N-terminal cysteine (18,19).
In this report we describe the concomitant use of two inteins, one with C-terminal cleavage activity and the other with Nterminal cleavage activity, to specifically generate a protein possessing both an N-terminal cysteine and a C-terminal thioester. Upon incubation this protein formed cyclic and/or polymeric species. This approach, termed the TWIN (two intein) system, allowed the facile isolation of circular peptides involved in organ-specific localization (20), inhibition of platelet aggregation (21), and inhibition of HIV-1 replication (22). Furthermore, future studies on protein polymerization may allow the production and investigation of analogs of fibrous proteins, such as silk.

EXPERIMENTAL PROCEDURES
TWIN Vector Construction-All vectors are derived from pTYB1 (New England Biolabs) or pTXB1 (7). The TWIN vectors utilized the Ssp DnaB mini-intein mutated for C-terminal cleavage (8), Ssp mini-intein (Cys 1 3 Ala) (a 154-aa intein derived from the Synechocystis sp. DnaB gene with a Cys 1 to Ala mutation), and the Mxe GyrA intein (Asn 198 3 Ala) (an intein in the Mycobacterium xenopi GyrA gene with an Asn 198 to Ala mutation), modified for N-terminal cleavage (7). pSXB1 was composed of the Ssp mini-intein (Cys 1 3 Ala) followed by a multiple cloning site (MCS) (5Ј-GGA AGA GCC ATG GAA TTC TCG TCG ACG GCG GCC GCC TCG AGG GCT CTT CC-3Ј), the Mxe GyrA intein (Asn 198 3 Ala) and the chitin binding domain (CBD) from Bacillus circulans (23). Plasmid pSTX1 contains the gene for Escherichia coli thioredoxin with coding sequence for 3 amino acids added to the N and C termini of the protein: Cys-Gly-Gly and Met-Arg-Met, respectively, inserted between the two inteins in pSXB1. pBSCXB1 and pBSCXB2 are TWIN vectors that place a CBD at the N and C termini of the precursor protein. pBSCXB1 encodes for a fusion protein of the CBD-Ssp mini-intein (Cys 1 3 Ala)-MCS-Mxe GyrA intein (Asn 198 3 Ala)-CBD, where the MCS is 5Ј-GGA AGA GCT ACC ATG GGC GGC CGC GAA TTC CTC GAG GGC TCT TCC-3Ј. pBSCXB2 is identical to pBSCXB1 except that the MCS region is 5Ј-TGC CGC GCC ATG GGC GGC CGC AAT GGA AGA GCT CGA ACA ACA ACA ACA ATA ACA ATA ACA ACA ACC TCG GGA TCG AGG GAA GGG GTA CGC TCG AGG GC-3Ј. pBSCXB2 encodes for a factor Xa (FXa) cleavage site 5 amino acids upstream from the N terminus of the Mxe GyrA intein. Insertion of the E. coli malE sequence, encoding maltose binding protein (MBP) (24), between the NcoI to SacI sites in pBSCXB2 yielded pBSMXB1. pBSTXB1 has the thioredoxin gene cloned in place of MBP in pBSMXB1. MBP purified from pBSMXB1 and thioredoxin purified from pBSTXB1 have Cys-Arg-Ala and Ser-Ser-Asn 10 -Leu-Gly-Ile-Glu-Gly-Arg-Gly-Thr-Leu-Glu-Gly added to their N and C termini, respectively.
Cloning of the thioredoxin gene into the AgeI to PstI sites in pBSL-C155 (8) created pBST1. pSTX6 was then generated by cloning the NdeI to RsrII fragment from pBST1 into the same sites in pSTX1. These constructs have DNA encoding Cys-Arg-Ala-Met-Gly-Gly-Arg-Thr-Gly and Met-Arg-Met added to the N and C termini of thioredoxin, * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Protein Purification from TWIN Vectors-ER2566 cells containing the appropriate TWIN plasmid were grown, induced, and pelleted as described for the Mxe GyrA intein (Asn 198 3 Ala) (7). The pelleted cells were resuspended in Buffer A (20 mM Tris-HCl, pH 8.5, containing 500 mM NaCl). Following sonication of the cell pellet, debris was removed by centrifugation at 23,000 ϫ g for 30 min. This clarified supernatant was applied to a chitin resin (bed volume, 15 ml) equilibrated in Buffer A. Cleavage of Intein 1 (see Fig. 1), Ssp mini-intein (Cys 1 3 Ala), was initiated by equilibrating the chitin column in Buffer B (20 mM Tris, pH 7.0, containing 500 mM NaCl) and proceeded for 20 h at room temperature, after which the resin was washed with 10 column volumes of Buffer B. Thiol-induced cleavage of Intein 2, Mxe GyrA intein (Asn 198 3 Ala), was performed by equilibrating the chitin resin in Buffer C (50 mM Tris, pH 8.5, containing 100 mM 2-mercaptoethanesulfonic acid (MESNA) and 250 mM NaCl) and incubating overnight at 4°C. The released target protein was eluted from the chitin resin using Buffer C. Purification with pSTX1 omitted the Intein 1 cleavage step, as this occurred in vivo, and Buffer D (50 mM Tris, pH 7.4, containing 30 mM NH 2 OH and 500 mM NaCl) replaced Buffer C.
Total yields of protein from pSTX1, pSTX6, pBSTXB1, and pB-SMXB1 were 5-15 mg/liter of cell culture when MESNA was used to induce cleavage of intein 2. However, the yields were 0.5-1 mg/liter of cell culture when NH 2 OH replaced MESNA as the cleavage reagent. Yields of peptides from pBBP1, pRGD1, and pCDR1 were 20 -50 g/ liter of cell culture. Thioredoxin concentrations were calculated by measuring the absorbance at 280 nm and using a molar absorptivity of 14,100. MBP concentrations were determined using the Bio-Rad protein assay with bovine serum albumin as the standard. Peptide concentrations were determined by comparing the absorbance at 214 nm of the HPLC eluted peptide peaks with a 13-amino acid peptide, NH 2 -THRF-FANNILVHN-COOH, of known concentration. The peptide solution was bound to a Vydac 218TP51 column and eluted with a 1 to 75% acetonirile gradient; all HPLC solutions contained 0.1% trifluoroacetic acid.
Protein Circularization, Polymerization, and Ligation-On column protein cyclization reactions occurred when MESNA was used to induce cleavage of Intein 2 as described under "Protein Purification from TWIN Vectors." Following elution from the chitin resin, the cyclic proteins were investigated with either 10 -20% Tricine gels (Novex) or MALDI-TOF mass spectrometry (a PerSeptive Biosystems Voyager-DE Biospectrometry workstation). Incomplete cyclization of thioredoxin allowed multimerization of noncircular molecules. Multimerization was accelerated by concentrating the freshly purified protein in a Centriprep 10 followed by a Centricon 10 concentration apparatus (Amicon) to a final concentration of total protein of 54 mg/ml. Ligation of thioredoxin to maltose binding protein utilized thioredoxin derived from pSTX1 and NH 2 OH as the cleaving reagent. The thioredoxin was extensively dialyzed against Buffer E (10 mM Tris, pH 7.4, containing 100 mM NaCl) to remove unreacted NH 2 OH. The dialyzed thioredoxin was mixed with freshly isolated thioester-tagged MBP, purified using plasmid pMRB10G as described previously (14), and allowed to react overnight at 4°C. The final concentration of the ligating species was 4.5-13.7 mg/ml thioredoxin and 1-3 mg/ml thioester-tagged MBP. Polymerization and ligation reactions were visualized by SDS-PAGE on 12% Tris-glycine or 10 -20% Tricine gels (Novex) stained with Coomassie Brilliant Blue. Following multimerization, the reaction was subjected to SDS-PAGE using 10 -20% Tricine gels. The bands were blotted onto nitrocellulose, and the three fastest migrating species were subjected to amino acid sequencing using a Procise 494 protein sequencer (PE Applied Biosystems, Foster City, CA).
Proteolysis and Sequencing of Circular Proteins-Plasmids pB-STXB1 and pBSMXB1 encode thioredoxin and MBP, respectively, with a FXa site 5 amino acids from the predicted C terminus. Expression of these genes generated both linear and circular forms of the protein which were treated with FXa (1:20, FXa:protein mass ratio) overnight at 4°C. The proteolyzed proteins were run on 10 -20% Tricine gels and blotted onto nitrocellulose, and the individual bands were subjected to amino acid sequencing.

RESULTS
Purification of Proteins Using the TWIN System-The TWIN system places an intein at both the N and the C termini of a target protein (Fig. 1). The intein at the N terminus of the target protein, Intein 1, was the Ssp mini-intein (Cys 1 3 Ala), which contains a mutation that blocks protein splicing but allows cleavage at its C terminus (8). Intein 2, at the C terminus of the target protein, was the Mxe GyrA intein (Asn 198 3 Ala) that undergoes thiol-inducible cleavage at its N terminus (7). A chitin binding domain present on one or both of the inteins allowed the immobilization of the desired precursor protein on chitin resin, whereas endogenous E. coli proteins could be washed away (Fig. 2A, lane 2). Intein 1 was found to undergo either in vitro ( Fig. 2A) or in vivo (Fig. 3, lane 2) cleavage with Cys-Arg or Cys-Gly at the N terminus of the target protein, respectively. In either case, an N-terminal cysteine was generated on the target protein. Cleavage of Intein 2 was induced with either MESNA or NH 2 OH. Following cleav- age of Intein 2 the purified protein was eluted from the chitin resin ( Fig. 2A, lane 3, and Fig. 3, lane 4).
Circular Proteins-On column protein cyclization was achieved when MESNA induced cleavage of Intein 2. This initially produced a thioester at the C terminus of the target protein. This thioester reacted with an N-terminal cysteine present on the same target protein that had been released by cleavage of Intein 1 (Fig. 1). Cyclization reactions of thioredoxin or MBP produced an extra band on SDS-PAGE that migrated at a position unexpected for the linear protein and represented the putative circular form. As anticipated, the extra band was not detected when NH 2 OH was used to induce cleavage of Intein 2 because this forms an unreactive hydroxamate on the C terminus of the target protein (Fig. 2) and prevents circularization.
Cyclization of thioredoxin (135 aa) and MBP (395 aa) was investigated by incubating these proteins with FXa followed by amino acid sequencing. Proteins expressed with both pBSTXB1 and pBSMXB1 have a FXa site 5 amino acids from their C terminus. FXa treatment of the elution fractions resulted in the disappearance of the putative cyclic protein species when visualized on SDS-PAGE (Fig. 2, B and C). The cyclization reaction occurred to Ͼ80%, with 9 and 3 amino acids added to the N and C termini of thioredoxin, respectively ( Fig. 2A, lane 3). When thioredoxin and MBP had 3 and 23 residues added to the N and C termini, respectively, cyclization was about 50% (Fig. 2, B  and C).
Amino acid sequencing of the FXa cleaved samples yielded two sequences for thioredoxin, one expected for the linear, XRAMGDKIIGLTTD (predicted linear form is CRAMGDKI-IGLTTD) and the other expected for the FXa linearized circular form, GTLEGCRAMGDKII, where GTLEG is the sequence expected at the C terminus of thioredoxin and CRAMGDKII is the expected N-terminal sequence. Two sequences were also detected for the FXa-treated cyclization reaction of MBP, XRAMGIEEGKL, which matched the expected N-terminal sequence for the linear MBP (CRAMGIEEGKL), and XTLEGCRAMGI, which agreed with the predicted sequence for the linearized cyclic MBP where GTLEG is the expected Cterminal sequence of MBP and CRAMGI is the predicted Nterminal sequence. In the amino acid sequencing data, an X is used to indicate that an amino acid could not be assigned for that sequencing cycle.
Circular Peptides-The cyclization of the small peptides BBP (9 aa), RGD (10 aa), and CDR-H3/C2 (14 aa) were confirmed by mass spectrometry (Table I and Fig. 4B). Predicted molecular masses for cyclic BBP, RGD, and CDR-H3/C2 were 977.2, 1120.3, and 2098.3 g/mol, respectively. These agreed well with the experimentally determined values of 977.1, 1119.9, and 2098.7 g/mol, respectively. A linear peptide generated by hydrolysis of the C-terminal thioester was not observed using the MALDI-TOF mass spectrometer. However, when using pBBP1 an extra species was observed with an apparent molecular mass of 1145.3 g/mol. This molecular mass is greater than the expected mass of the MESNA-tagged peptide, 1119.3 g/mol, and may represent the thioester-tagged linear BBP peptide that has undergone either in vitro or in vivo modification. However, this was not verified and peptides from pRGD or pCDR had no detectable levels of a comparable species.
Protein Polymerization-Polymeric species of thioredoxin were generated by concentrating freshly isolated thioredoxin, purified from cells containing pSTX6, as described under "Experimental Procedures." Multiple bands were visible that corresponded to the expected molecular masses of multimers of  thioredoxin ( Fig. 2A, lane 5). Amino acid sequencing of l-Trx and Trx 2 resulted in XRAMGG; the predicted N terminus for this thioredoxin construct is CRAMGG. No sequence data could be obtained from the band directly below Trx 2 , indicating that the N terminus of this protein may be blocked. This may represent the cyclized dimer of thioredoxin. The bands above Trx 2 , which were not sequenced, reacted with an anti-thioredoxin antibody, suggesting that these species represent thioredoxin multimers (data not shown).
Protein-Protein Ligation-NH 2 OH-induced cleavage of the precursor protein derived from pSTX1 generated thioredoxin with an N-terminal cysteine and a C-terminal hydroxamate. Incubation of this thioredoxin with MBP containing a C-terminal thioester resulted in ligation as determined by the appearance of an extra band on SDS-PAGE that migrated at the position expected for the MBP-thioredoxin fusion (Fig. 3, lane 6). DISCUSSION The facile isolation of cyclic proteins possessing a continuous peptide bond backbone is accomplished using one affinity chromatography step with the TWIN system (Fig. 1). The TWIN system sandwiches a target protein between two inteins, one engineered to cleave at its C terminus (Intein 1) and the other modified to undergo thiol-induced N-terminal cleavage (Intein 2). Cleavage of Intein 1 produces an N-terminal cysteine on a bacterially expressed protein, whereas thiol-induced cleavage of Intein 2 can produce a C-terminal thioester on the same protein. These reactive groups undergo spontaneous condensation as described previously (12,13).
Three circular peptides were isolated with the TWIN system. BBP has been reported to target phage to specific organs in mice (20), RGD inhibits platelet aggregation (21), and CDR-H3/C2 has been shown to inhibit HIV-1 replication (22). Previously, these peptides were cyclized by chemical synthesis with an N-and C-terminal cysteine followed by oxidization to form a disulfide bond. However, in the present study these peptides were expressed in E. coli and cyclized with a native peptide bond between their N and C termini, which offers resistance to reducing environments, an important consideration for cyclic compounds that work intracellularly. Furthermore, these proteins lack both N and C termini and should be resistant to exoproteases and may form more stable drugs for use in serum. Novel work by Muir and co-workers (25) has demonstrated the cyclization of synthetic peptides through a peptide bond. However, chemical synthesis is currently limited to a peptide of about 100 amino acids.
Use of a bacterial expression system circumvents this limitation as demonstrated by the cyclization of thioredoxin (135 aa) and MBP (395 aa). Also, because inteins are used to generate both reactive groups, there is no need for proteases, which have been previously used to release an N-terminal cysteine for ligation reactions (18,19). Proteases require an extra processing step, may cleave an undesired site within the target protein, and often must be removed or inactivated following treatment.
Proteins with complementary reactive groups not only can cyclize, but also may polymerize. Many proteins of structural importance, such as silks and collagen, are formed of blocks of repeating amino acid sequences. The expression of the repeating unit in monomeric form followed by in vitro polymerization may allow more rapid investigations of these compounds for the development of novel biomaterials. Furthermore, work by Zhang et al. (26,27) have demonstrated that the interaction of small peptides can result in the formation of peptide membranes. The mechanical properties of these biomaterials may be altered or improved by polymerizing the peptides before assembly. However, an obstacle in the present study was the cyclization of thioredoxin, which then becomes unreactive to polymerization (Fig. 2). The extent of this cyclization depended on the extra amino acids added to thioredoxin. Protein structure, flexibility, and propensity to self-associate are all factors that affect whether a protein will cyclize or polymerize, and the investigation of these factors may allow the reaction to be biased one way or the other.
The effect of the extein residue adjacent to an intein has been observed previously (8,17,28). However, in the present study the C-terminal cleavage of the Ssp mini-intein (Cys 1 3 Ala) occurs in vivo with Cys-Gly following the intein (Fig. 3, lane 2), but almost no in vivo cleavage is observed with Cys-Arg ( Fig.  2A, lane 1). This indicates that extein sequences at least two amino acids from the scissile peptide bond can have dramatic effects on intein activity and further demonstrates that cleavage can be modulated without changing the intein itself.  4). The precursor proteins, CBD-Ssp (Cys 1 3 Ala)-target-Mxe (Asn 198 3 Ala)-CBD in which the target is RGD, BBP, and CDR-H3/C2, from pRGD1, pBBP1, and pCDR1, respectively, have predicted molecular masses of 50.5, 50, and 51.5 kDa, respectively. In vivo cleavage of the precursors results in CBD-Ssp (Cys 1 3 Ala) (22 kDa) as well as RGD-Mxe (Asn 198 3 Ala)-CBD (28.5 kDa) and BBP-Mxe (Asn 198 3 Ala)-CBD (28 kDa) from pRGD1 and pBBP1, respectively. The Ssp mini-intein migrates on SDS-PAGE as a protein 5 kDa larger than predicted and effects fusion proteins in which it is present in a similar manner. B, MALDI-TOF mass spectrometry of the cyclized BBP protein. Expected molecular mass of the circular form of the protein is 977.2 g/mol.
It is interesting that both the Ssp mini-intein (Cys 1 3 Ala) and Mxe GyrA intein (Asn 198 3 Ala) are active when separated by a linker of only 9 amino acids (see pBBP1), considering that it is possible that a short linker may constrain the folding of one or both of the inteins and inhibit the cleavage reaction. Future work should determine the minimal linker that permits the proper functioning of both inteins.
In conclusion, the TWIN system permits the facile production of bacterially expressed proteins with an N-terminal cysteine and a C-terminal thioester for use in IPL reactions. This technology will allow the investigation of large circular proteins and may allow the generation of large proteins composed of repeating units, analogous to silk proteins from arachnids or insects. In the future, controllable cleavage of Intein 1 in the TWIN system will permit the ligation of three or more protein fragments in succession.