Novel Split Intein for trans-Splicing Synthetic Peptide onto C Terminus of Protein*

Conventional split inteins have been useful for trans-splicing between recombinant proteins, and an artificial S1 split intein is useful for adding synthetic peptide onto the N terminus of recombinant proteins. Here we have engineered a novel S11 split intein for trans-splicing synthetic peptide onto the C terminus of recombinant proteins. The C-intein of the S11 split intein is extremely small (6 amino acids (aa)); thus it can easily be produced together with a synthetic C-extein to be added to the C terminus of target proteins. The S11 intein was derived from the Ssp GyrB intein after deleting the homing endonuclease domain and splitting the remaining intein sequence near the C terminus, producing a 150-aa N-intein (IN) and a 6-aa C-intein (IC). Its trans-splicing activity was demonstrated first in Escherichia coli cells and then in vitro for trans-splicing between a synthetic peptide and a recombinant protein. The in vitro trans-splicing reaction exhibited a typical rate constant of (6.9 ± 2.2) × 10–5 s–1 and reached a high efficiency of ∼80%. This S11 split intein can be useful for adding any desirable chemical groups to the C terminus of a protein of interest, which may include modified and unnatural amino acids, biotin and fluorescent labels, and even drug molecules.

Inteins are internal protein sequences and catalyze a proteinsplicing reaction, which precisely excises the intein sequence and join the flanking sequences (N-and C-exteins) with a peptide bond (1). The reaction mechanism of protein splicing has been well studied (2,3), and the conserved crystal structure of the protein-splicing domain of inteins consists of ϳ12 ␤-strands that form a disk-like compact structure with the two splicing junctions located in a central cleft (4 -7). More than 400 inteins and intein-like sequences have been found in a wide variety of host proteins and in microorganisms including bacteria, Archaea, and eukaryotes (8,9). Inteins are typically 350 -550 aa 3 in size (8,9) with the majority containing a homing endonuclease domain. Some inteins are as large as 1650 aa in size and contain tandem repeats (10,11). Inteins can lose the endonuclease domain, and the resulting functional mini-intein retains only the splicing domain ϳ140 aa in size (12)(13)(14).
Split inteins are essentially mini-inteins broken into two pieces and able to reassociate and carry out protein trans-splicing (15). In certain cyanobacteria, a natural split intein is responsible for producing a mature DnaE protein by transsplicing two separate polypeptides expressed from two separate genes (16,17). Artificial split inteins have also been engineered by splitting the sequences of contiguous inteins to resemble naturally occurring split inteins (13,18,19). Split inteins have many practical uses including the production of recombinant proteins from fragments and the circularization of recombinant proteins (20). However, conventional split inteins are less useful for adding synthetic peptide to proteins because their split sites are relatively close to the middle of the intein sequence and correspond to the insertion site of the homing endonuclease domain. These split sites are referred to as S0 and split the intein sequence into an N-terminal piece of ϳ100 aa and a C-terminal piece of ϳ40 aa (21). These intein pieces are prone to misfolding, and their relatively large sizes make them difficult to be produced through chemical synthesis, particularly after adding desired extein sequences.
Novel split inteins having a split site proximal to the N-or C-terminal would have expanded utilities because they can be used to trans-splice a short synthetic peptide onto recombinant proteins. Recently, an S1 split intein was engineered from the Ssp DnaB intein by splitting the intein sequence at a S1 site proximal to the N-terminal, producing an N-terminal piece (N-intein) of only 11 aa in length and a C-terminal piece (C-intein) of 144 aa in length (21). As the short N-intein can be synthesized chemically with an additional N-extein peptide containing desirable chemical groups, this S1 split intein has been used successfully to trans-splice a fluorophore onto the N terminus of recombinant proteins (22). However, this S1 split intein is not suitable for adding synthetic peptide to the C terminus of proteins because the 144-aa C-intein is too large for chemical synthesis. A previous attempt to produce split inteins having a small C-intein has not been successful (21).
Here we report the successful engineering of a novel S11 split intein for adding synthetic peptide to the C terminus of recombinant proteins. A total of nine natural inteins were first converted to mini-inteins and then split at the S11 split site near the C terminus of the intein sequence, producing an N-intein of ϳ150 aa in length and a C-intein of only 6 aa in length. Among all the inteins tested, an S11 split intein derived from the Ssp GyrB intein was first shown to trans-splice efficiently in Escherichia coli cells. It was further shown to efficiently trans-splice a synthetic peptide with a recombinant protein in vitro. Kinetic analysis of the trans-splicing reaction demonstrated a high efficiency (ϳ80%) and a rate constant comparable with those of the * This work was supported by a research grant from the Canadian Institutes of Health Research. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1 Both authors contributed equally to this work. 2 To whom correspondence should be addressed. Fax: 902-494-1355; E-mail: Paul.Liu@Dal.Ca. 3 The abbreviations used are: aa, amino acids; IPTG, isopropyl-1-thio-␤-D-galactopyranoside; DTT, dithiothreitol; TCEP, tris(2-carboxyethyl)phosphine.
conventional and the S1 split inteins. This novel S11 split intein significantly expands the utility of split inteins, providing the potential to add any number of desirable chemical groups onto the C terminus of target proteins.

EXPERIMENTAL PROCEDURES
Plasmid Construction-The Ssp GyrB mini-intein was constructed by fusing the N-and C-terminal coding sequences of the Ssp GyrB intein (8). The N-terminal segment was PCRamplified from Synechocystis sp. PCC6803 genomic DNA using a pair of oligonucleotide primers (5Ј-CTC GAG GGC GGT TGT TTT TCT GGA GAT ACA TTA GTC GC-3Ј and 5Ј-CAT ATG ACC AGA ATC TTC CGT AGT CGA AAT-3Ј). The C-terminal segment was obtained similarly, using the primer pair 5Ј-CAT ATG GAA GCA GTA TTA AAT TAC AAT CAC AG-3Ј and 5Ј-GAC CGG TCT CGC CAG CGC TGT TAT GGA CAA ACA CTC-3Ј. These segments were digested with appropriate restriction enzymes (XhoI, NdeI, and AgeI) and were placed in the pMST plasmid (21) between the XhoI and AgeI sites, creating the pMSG plasmid. This plasmid contains a maltose-binding protein as the N-extein and thioredoxin as the C-extein. DNA sequencing was used to confirm the sequence of all plasmids.
To create the Ssp GyrB split intein plasmid pMSG-S11, a spacer sequence was inserted into the mini-intein coding sequence at the S11 split site by inverse PCR as described previously (21). The spacer DNA sequentially contains a stop codon, a ribosome binding site, and a start codon and has the following sequence: 5Ј-TAA TTA ACT TAT AAG GAG GAA AAA CAT ATG-3Ј. This creates a two-gene operon where the N-protein containing the maltose-binding protein and N-intein is followed by the C-protein containing the C-intein and thioredoxin.
A plasmid (pMSG-S11 inv ) with the N-and C-proteins in the reverse order was modified from pMSG-S11. This was done through circularization of the PCR-amplified two-gene operon from pMSG-S11 followed by linearization at the S11 split site and ligation into a modified pMST-plasmid (lacking extein proteins) at introduced EcoRI and BamHI sites. To produce a plasmid expressing the N-protein only (pMI N H), the coding sequence of a His tag (six histidines) was added to the 3Ј end of the I N coding sequence through inverse PCR using the primer pair 5Ј-CAT CAC CAC CAT CAC CAT TAA TTA ACT TAT AAG GAG GAA AAA CAT ATG-3Ј and 5Ј-CGT TGC CAA AGC AAA ATT GTG-3Ј, and the C-protein was deleted from the pMSG-S11 plasmid through digestion with NdeI-HindIII followed by filling in and blunt-end ligation.
Protein Splicing in E. coli cells-Each expression plasmid was introduced into E. coli cells (DH5␣ strain). Subsequent protein expression, gel electrophoresis, and Western blotting were carried out as described previously (21). Briefly, cells were grown to late log phase (A 600 ϭ 0.5), IPTG was added to a final concentration of 0.8 mM to induce protein expression for 3 h at 37°C or for ϳ16 h at room temperature, cells were harvested and lysed in an SDS-and DTT-containing buffer in a boiling water bath, total cellular proteins were analyzed through SDS-PAGE, and protein bands were visualized by staining with Coomassie Brilliant Blue R-250. Western blotting was carried out with either anti-maltose-binding protein monoclonal antibody (New England Biolabs) or anti-thioredoxin monoclonal antibody (Invitrogen), using the enhanced chemiluminescence detection kit (GE Healthcare). The intensity of the protein band was estimated using a gel documentation system (Gel Doc 1000 coupled with Molecular Analyst software, Bio-Rad).
In Vitro Protein-Peptide trans-Splicing-The N-precursor protein (MI N H) was expressed in E. coli as above and affinitypurified using amylose resin according to the manufacturer's instructions (New England Biolabs). The synthetic peptide (I C F) was purchased from EZBiolab. For in vitro trans-splicing, 20 M MI N H protein was incubated with 200 M I C F peptide in a splicing buffer (20 mM Tris-HCl, 150 mM NaCl, 1 mM EDTA; pH 8.0) in the presence or absence of 0.1 mM TCEP or 1.0 mM DTT at room temperature for a specified length of time. Splicing products were analyzed through SDS-PAGE in the presence of DTT and visualized either by staining with Coomassie Brilliant Blue R-250 or by Western blotting using an anti-maltosebinding protein antibody (see above), an anti-FLAG antibody (Sigma), or an anti-His tag antibody (Roche Applied Science).

RESULTS
Construction of the S11 Split Inteins-Nine natural inteins were tested to find a functional split site proximal to the C terminus of the intein sequence. They were the Ssp GyrB, Tth DnaE-1, Tth DnaE-2, Tth RIR, CneA Prp8, Ter RIR-1, Ter RIR-2, Ter RIR-3, and Ter RIR-4 inteins. These natural inteins all have contiguous sequences (8), and the last five have previously been shown to be active in cis-splicing (23,24). Working from the coding sequence of each intein, the homing endonuclease domain sequence (if present) was deleted to produce a mini-intein, based on previously described domain predictions (8,25,26). Each mini-intein sequence was then split at the S11 site to produce the split intein. The S11 site was chosen through protein sequence alignments with inteins of known crystal structures such that the S11 site was near the C terminus and between the last two ␤-strands, ␤11 and ␤12. An example of the sequence alignment is shown in Fig. 1, where the Ssp GyrB mini-intein is aligned to the Ssp DnaB mini-intein, which has a known crystal structure (7).
The S11 split intein constructs were expressed in E. coli using the previously described pMST plasmid (21). As illustrated in Fig. 2A, each intein was flanked by a maltose-binding protein (M) as the N-extein and a thioredoxin (T) as the C-extein, expressed from an IPTG-inducible Ptac promoter. For the S11 split intein, the intein coding sequence was split at the S11 site to create a two-gene operon by adding a spacer sequence containing sequentially a stop codon, a ribosome binding site (Shine-Dalgarno sequence), and a start codon. In this operon, the first gene encoded the N-precursor protein consisting of the M sequence (N-extein) followed by the N-terminal intein piece (N-intein or I N ), and the second gene encoded the C-precursor protein consisting of the C-terminal intein piece (C-intein or I C ) followed by the T sequence as C-extein. Similar gene constructs had been used in previous studies of split inteins (21,23,24), which allowed easy expression and identification of the protein products. S11 Split Intein MARCH 6, 2009 • VOLUME 284 • NUMBER 10

JOURNAL OF BIOLOGICAL CHEMISTRY 6195
Splicing Activities in E. coli-The above plasmids were introduced into E. coli cells to produce the corresponding proteins after IPTG induction, the resulting total cellular proteins were resolved through SDS-PAGE, and relevant protein bands were identified by their predicted sizes and through Western blotting. Proteins were quantified by measuring the intensity of the corresponding signal on Western blots, and the efficiency of splicing was calculated as the ratio of the spliced protein over the total protein (the spliced protein plus the remaining precursor protein). The C-precursor protein was used in estimating the efficiency of transsplicing because the N-precursor protein existed in excessive amounts due to its higher expression level as the first gene of the twogene operon, with the exception of the pMSG-S11 inv plasmid.
Among all the S11 split intein constructs tested, only the Ssp GyrB S11 split intein showed a trans-splicing activity in E. coli. The spliced protein (MT) was clearly observed after Coomassie Blue staining and confirmed by Western blotting (Fig. 2B), indicating that protein trans-splicing had occurred. A certain amount of the C-precursor protein (I C T) remained, indicating that the transsplicing reaction did not go to completion. The efficiency (completeness) of the trans-splicing was estimated to be 84% at room temperature but only 16% at 37°C. The remote possibility of cis-splicing via translational readthrough of the two-gene operon was also investigated. When the gene order in the two-gene operon was reversed in plasmid pMSG-S11 inv , making cissplicing topologically impossible, the spliced protein was still observed (Fig. 2B), thereby ruling out this possibility. Additional tested S11 split inteins, derived from the other eight natural inteins beside the Ssp GyrB intein (see above), did not show a detectable amount of trans-splicing under the tested conditions (data not shown).  . The recombinant C-precursor protein consists of the 6-aa C-intein sequence (I C ) followed by a thioredoxin sequence (T). Coding sequences of these precursor proteins were placed in an expression plasmid as a two-gene operon behind the IPTG-inducible Ptac promoter. In plasmid pMSG-S11, the N-precursor gene precedes the C-precursor gene. In plasmid pMSG-S11 inv , the C-precursor gene precedes the N-precursor gene. B, SDS-PAGE analysis of the trans-splicing reaction. Protein bands were visualized by Coomassie blue staining or by Western blotting using an anti-T antibody against the thioredoxin sequence, as indicated. Positions are marked for the N-precursor protein (MI N ), the C-precursor protein (I C T), and the spliced protein (MT). Lanes 1-3 are total proteins of E. coli cells before the IPTG induction of precursor protein expression, after IPTG induction for 3 h at 37°C, and after IPTG induction overnight at room temperature, respectively. The E. coli cells harbored either the pMSG-S11 plasmid or the pMSG-S11 inv plasmid as indicated.

Protein-Peptide trans-Splicing in Vitro-The Ssp
GyrB S11 split intein was tested in vitro to trans-splice a recombinant N-precursor protein with a synthetic peptide (Fig. 3A). The N-precursor protein MI N H consisted of a maltose-binding protein (M), the N-intein (I N ), and a His tag sequence (H). The synthetic peptide (I C F) is 30 aa in length with the following sequence: GVFVHNSADYKDDDDKSGCLAGDTLITLAS. It contains the I C sequence (GVFVHN) followed by a C-extein, and the C-extein sequence starts with a serine residue required for splicing and contains the FLAG epitope (DYKDDDDK) for easy detection. If the trans-splicing occurs, the FLAG-containing C-extein (F) will be transferred onto the C terminus of the maltose-binding protein (M), producing the spliced protein (MF). Simultaneously, the N-intein piece (I N H) will be released from the precursor protein, and the C-intein (I C ) will be released from the synthetic I C F peptide.
The predicted products of the above protein-peptide transsplicing were observed after Coomassie Blue staining and identified through Western blotting (Fig. 3B). As expected, the spliced protein MF was recognized by both anti-M and anti-F antibodies, the N-terminal intein piece I N H was recognized only by the anti-H antibody, and the N-precursor protein MI N H was recognized by both anti-M and anti-H antibodies. This trans-splicing reaction was enhanced by the addition of reducing agents (TCEP and DTT), as indicated by the increased amounts of the spliced protein (Fig. 3B). Over 80% of the N-precursor protein was converted into the spliced protein in the presence of 1 mM DTT after 11 h of incubation with the synthetic peptide (Fig. 4A). A minor protein band corresponding to the M part of the N-precursor protein (MI N H) was also observed, suggesting that a small amount of the N-precursor protein underwent N-cleavage (i.e. peptide bond breakage at N terminus of the I N sequence) instead of trans-splicing.
The rate constant of the above trans-splicing reaction was also determined for comparison with conventional split inteins. The purified N-precursor protein was incubated with a 10-fold molar excess of the I C F peptide to achieve a pseudofirst order reaction regarding the N-precursor protein. The reaction products were resolved through SDS-PAGE, visualized by Coomassie Blue staining, and quantified through laser scanning and gel documentation. The amount of the spliced product was calculated as a percentage of the starting amount of N-precursor protein. The percentage of spliced product formation was plotted against time from a series of measurements over a 31-h period (Fig. 4B). The plot was fitted to the pseudo-first order reaction equation of p ϭ P 0 (1-e Ϫkt ) (28), and the rate constant (k obs ) was determined to be (6.9 Ϯ 2.2) ϫ 10 Ϫ5 s Ϫ1 .

DISCUSSION
We have succeeded for the first time in constructing a novel S11 split intein capable of protein trans-splicing after testing nine different natural inteins by splitting the intein sequences at the S11 site near the C terminus. For a trans-splicing reaction to occur, the two precursor proteins must be able to recognize and associate with one another. The exteins used in this study, maltose-binding protein and thioredoxin, do not interact structurally and cannot play a role in bringing together the two precursor proteins. Therefore the intein fragments (I N and I C ) must be solely responsible for the recognition and association between the two precursor proteins in vivo or between the N-precursor protein and the short peptide in vitro. Interestingly, the functional association between the I N and the I C appears to be a rare property of the S11 split intein derived from the Ssp GyrB intein because similarly constructed S11 split inteins derived from nine other natural inteins did not show a trans-splicing activity in this study and in a previous report (21).
It is striking that the 6-aa I C could associate with the 150-aa I N to reconstitute an active intein. The structural basis of this I N -I C association may be considered in terms of the highly conserved crystal structures of inteins in general, although no crystal structure is available for the Ssp GyrB intein. The highly conserved mini-intein structure typically consists of ϳ12 FIGURE 3. Protein-peptide trans-splicing in vitro using the S11 split intein. A, schematic illustration of the trans-splicing reaction. The recombinant N-precursor protein consists of a maltose-binding protein sequence (M) followed by the 150-aa N-intein sequence (I N ) and a His tag (six histidines) sequence (H). After expression in E. coli, the N-precursor protein was affinity-purified using its maltose-binding protein part. The chemically synthesized peptide consists of the 6-aa C-intein (I C ) followed by a C-extein sequence (F) that begins with a Ser residue and contains the FLAG epitope. B, SDS-PAGE analysis of the trans-splicing reaction. Protein bands were visualized by Coomassie Blue staining, Western blot using an anti-M antibody against the maltose-binding protein sequence, Western blot using an anti-H antibody against the His tag sequence, or Western blot using an anti-F antibody against the FLAG epitope, as indicated. Positions are marked for the N-precursor protein (MI N H), the spliced protein (MF), and the released N-intein (I N H). The N-precursor protein was incubated for 16 h at room temperature alone, with the I C F peptide, with the I C F peptide plus 0.1 mM TCEP, or with the I C F peptide plus 1 mM DTT, as indicated.
␤-strands (4 -7), and the 6-aa I C sequence corresponds to the last ␤-strand ␤12 in the Ssp DnaB mini-intein (Fig. 1). In the disk-like structure of the mini-intein, ␤12 is located inside the centrally positioned catalytic pocket of the intein, and it interacts directly with ␤11 to form a short anti-parallel ␤-sheet. This may explain the intermolecular association between I N and I C of the S11 split intein because the I C contains ␤12, whereas the I N contains ␤11 and other parts of the catalytic pocket.
The in vitro trans-splicing conditions allowed for an analysis of the reaction kinetics of the Ssp GyrB S11 split intein. The observed rate constant of (6.9 Ϯ 2.2) ϫ 10 Ϫ5 s Ϫ1 is close or comparable with previously reported rate constants of the conventional and the S1 split inteins. The Ssp DnaB S1 split intein, whose split site is near the N-terminal of intein (21), exhibited a rate constant of (4.1 Ϯ 0.2) ϫ 10 Ϫ5 s Ϫ1 (22). Conventional split inteins, whose split sites are near the middle of the intein and correspond to the insertion site of the homing endonuclease domain, exhibited rate constants ranging from ϳ10 Ϫ4 s Ϫ1 for the synthetic Ssp DnaB split intein (27) to ϳ10 Ϫ5 s Ϫ1 for the naturally occurring Ssp DnaE split intein (28).
In addition to trans-splicing, a small amount of N-cleavage (at the N terminus of I N ) was observed for the Ssp GyrB S11 split intein when DTT was present. This is not surprising because strong nucleophilic compounds such as DTT are known to resolve ester bonds formed during the first two steps (N-S/ N-O acyl shift and transesterification) of protein splicing, which cleaves the N-extein off the precursor protein. However, only ϳ10% N-cleavage occurred along with the ϳ75% trans-splicing, which compares favorably with the Ssp DnaB S1 split intein that showed ϳ40% cleavage along with ϳ40% transsplicing under similar in vitro conditions (22).
The demonstration of in vitro trans-splicing between a protein and a peptide, using the Ssp GyrB S11 split intein, potentially presents a new and useful method for protein research and biotechnology. A synthetic peptide can easily be made to contain the 6-aa I C and a C-extein to be spliced onto the C terminus of an I N -containing target protein. The C-extein in this study contained a FLAG epitope for easy analysis but can potentially contain any synthetic moiety one might wish to splice onto the C terminus of a target protein. For example, the synthetic C-extein may contain biotin, fluorescent groups, modified or unnatural amino acids, and drug molecules. The only known requirement is that the C-extein begins with a nucleophilic amino acid residue (Ser, Cys, or Thr), although it is possible that other features of the C-extein might influence the efficiency of the trans-splicing reaction. The ϳ80% trans-splicing efficiency of the Ssp GyrB S11 split intein is comparable with or higher than previously reported trans-splicing efficiencies of conventional and S1 split inteins. Unlike previously reported split inteins, the Ssp GyrB S11 split intein may be uniquely or most suitable for splicing synthetic chemical groups onto the C terminus of recombinant proteins.