The in Vitro Ligation of Bacterially Expressed Proteins Using an Intein from Methanobacterium thermoautotrophicum *

The smallest known intein, found in the ribonucleoside diphosphate reductase gene of Methanobacterium thermoautotrophicum ( Mth RIR1 intein), was found to splice poorly in Escherichia coli with the naturally occurring proline residue adjacent to the N-terminal cysteine of the intein. Splicing proficiency increased when this proline was replaced with an alanine residue. How-ever, constructs that displayed efficient N- and C-termi-nal cleavage were created by replacing either the C-terminal asparagine or N-terminal cysteine of the intein, respectively, with an alanine. Furthermore, these constructs were used to specifically generate complementary reactive groups on protein sequences for use in ligation reactions. Reaction between an intein-generated C-terminal thioester on E. coli maltose-bind-ing protein (43 kDa) and an intein-generated cysteine at the N terminus of either T4 DNA ligase (56 kDa) or thioredoxin (12 kDa) resulted in the ligation of the proteins through a native peptide bond. Thus the smallest of the known inteins is capable of splicing and its unique properties extend the utility of intein-mediated protein ligation to include the in vitro fusion of large, bacterially expressed proteins. DNA ligase with an N-terminal cysteine. The Mth intein for N-terminal cleavage, in- tein(N), carried the Pro 2 1 3 Gly/Asn 134 3 Ala double mutation. The full-length fusion protein consisting of MBP-intein(N)-CBD was sepa- rated from cell extract by binding the CBD portion of the protein to a chitin resin. Overnight incubation in the presence of 100 m M MESNA induced cleavage of the peptide bond prior to the N terminus of the intein and created a thioester on the C terminus of MBP. The C- terminal cleavage vector, intein(C), had the Pro 2 1 3 Gly/Cys 1 3 Ala double mutation. The precursor CBD-intein(C)-T4 DNA ligase was iso- lated from induced E. coli cell extract by binding to a chitin resin as described for N-terminal cleavage. Fission of the peptide bond following the C-terminal residue of the intein resulted in the production of T4 DNA ligase with an N-terminal cysteine. Ligation occurred when the proteins containing the complementary reactive groups were mixed and concentrated, resulting in a native peptide bond between the two react- ing species.

Inteins (1), the protein equivalent of the self-splicing RNA introns, catalyze their own excision from a precursor protein with the concomitant fusion of the flanking protein sequences, known as exteins (reviewed in Refs. [2][3][4]. Almost 100 inteins have been identified (5) 1 and can be grouped into three classes: 1) the inteins containing a homing endonuclease between the two splicing domains, 2) the mini-inteins, which lack the homing endonuclease, and 3) a newly described trans-splicing intein (6).
Of the mini-inteins, the smallest is the 134-amino acid intein found in the ribonucleoside diphosphate reductase gene of Methanobacterium thermoautotrophicum (Mth RIR1 intein; Ref. 7). This intein may be close to the minimum amino acid sequence needed to promote splicing, and interestingly, it has a proline residue N-terminal to the first amino acid of the intein, Pro Ϫ1 (see Fig. 1), which was shown to inhibit splicing in an intein found in the 69-kDa vacuolar ATPase subunit of Saccharomyces cerevisiae (Sce VMA intein; Ref. 8).
Studies into the mechanism of splicing led to the development of a protein purification system that utilized thiol-induced cleavage of the peptide bond at the N terminus of the Sce VMA intein (9). Purification with this system generated a bacterially expressed protein with a C-terminal thioester (9). Two research groups then applied the chemistry described for native chemical ligation (10) to fuse a synthetic peptide with an N-terminal cysteine to a bacterially expressed protein possessing a C-terminal thioester (11,12). This technique, known as intein-mediated protein ligation (IPL) 2 or also as expressed protein ligation, represented an important advance in protein semi-synthetic techniques (reviewed in Refs. 13 and 14). However, the generality of IPL was limited by the use of a synthetic peptide as a ligation partner.
We describe the next major advance in intein-mediated protein ligation, which is the modulation of the Mth RIR1 intein for the facile isolation of a protein with an N-terminal cysteine for use in the in vitro fusion of two bacterially expressed proteins. Furthermore, the Mth RIR1 mini-intein, the smallest known protein splicing element, was found to be capable of splicing. These results significantly expand the utility of IPL to include the labeling of extensive portions of a protein for NMR analysis and the isolation of a greater variety of cytotoxic proteins. In addition, this advance opens the possibility of labeling the central portion of a protein by ligating three fragments in succession.

EXPERIMENTAL PROCEDURES
Mth RIR1 Synthetic Gene Construction-The gene encoding the Mth RIR1 intein along with 5 native N-and C-extein residues ( Fig. 1; Ref. 7) was constructed using 10 oligonucleotides (New England Biolabs, Beverly, MA) comprising both strands of the gene and overlapping by at least 20 base pairs. 1) 5Ј-TCGAGGCAACCAACCCCTGCGTATCCGG- To ensure maximal Escherichia coli expression, the coding region of the synthetic Mth RIR1 intein * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
‡ To whom correspondence should be addressed. incorporates 61 silent base mutations in 48 of the 134 codons. The oligonucleotides were annealed by mixing at equimolar ratios (400 nM) in a ligation buffer (50 mM Tris-HCl, pH 7.5, containing 10 mM MgCl 2 , 10 mM dithiothreitol, 1 mM ATP, and 25 g of bovine serum albumin) followed by heating to 95°C. After cooling to room temperature, the annealed and ligated oligonucleotides were inserted into the XhoI and AgeI sites of pMYB5 (New England Biolabs), replacing the Sce VMA intein and creating the plasmid pMRB8P.
Protein Purification with the N-terminal Cleavage Construct-Purification was as described previously for the Sce VMA and Mxe GyrA inteins (9,11). Briefly, ER2566 cells (11) containing the appropriate plasmid were grown at 37°C in LB broth containing 100 g/ml ampicillin to an A 600 of 0.5-0.6 followed by induction with IPTG (0.5 mM). Induction was either overnight at 15°C or for 3 h at 30°C. The cells were pelleted by centrifugation at 3,000 ϫ g for 30 min followed by resuspension in buffer A (20 mM Tris-HCl, pH 7.5, containing 500 mM NaCl). The cell contents were released by sonication. Cell debris was removed by centrifugation at 23,000 ϫ g for 30 min, and the supernatant was applied to a column packed with chitin resin (bed volume, 10 ml) equilibrated in buffer A. Unbound protein was washed from the column with 10 column volumes of buffer A. Thiol reagent-induced cleavage was initiated by rapidly equilibrating the chitin resin in buffer B (20 mM Tris-HCl, pH 8, containing 500 mM NaCl and 100 mM 2-mercaptoethanesulfonic acid (MESNA)). The cleavage reaction proceeded overnight at 4°C, after which the protein was eluted from the column.
Protein Purification with the C-terminal Cleavage Construct-Protein purification was performed as described above with buffer A replaced by buffer C (20 mM Tris-HCl, pH 8.5, containing 500 mM NaCl) and buffer B replaced by buffer D (20 mM Tris-HCl, pH 7.0, containing 500 mM NaCl). Also, following equilibration of the column in buffer D the cleavage reaction proceeded overnight at room temperature. Protein concentrations were determined using the Bio-Rad protein assay.
Protein-Protein Ligation Using IPL-Freshly isolated thioestertagged protein was mixed with freshly isolated protein containing an N-terminal cysteine residue (starting concentration, 1-200 M). The solution was concentrated with a Centriprep 3 or Centriprep 30 apparatus (Millipore Corporation, Bedford, MA) then with a Centricon 3 or Centricon 10 apparatus to a final concentration of 0.15-1.2 mM for each protein. Ligation reactions proceeded overnight at 4°C and were visualized using SDS-PAGE with 12% Tris-glycine gels (Novex Experimental Technology, San Diego, CA) stained with Coomassie Brilliant Blue.
Factor Xa Cleavage of MBP-T4 Ligase Fusion Protein and Protein Sequencing-2 mg of ligation reaction involving MBP and T4 DNA ligase was bound to 3 ml of amylose resin (New England Biolabs) equilibrated in buffer A (see above). Unreacted T4 DNA ligase was rinsed from the column with 10 column volumes of buffer A. Unligated MBP and the MBP-T4 DNA ligase fusion protein were eluted from the amylose resin using buffer E (20 mM Tris-HCl, pH 7.5, containing 500 mM NaCl and 10 mM maltose). Overnight incubation of the eluted protein with a 200:1 protein:bovine factor Xa (New England Biolabs) ratio (w/w) at 4°C resulted in the proteolysis of the fusion protein and regeneration of a band on SDS-PAGE gels that ran at a molecular weight similar to T4 DNA ligase. N-terminal amino acid sequencing of the proteolyzed fusion protein was performed on a Procise 494 protein sequencer (PE Applied Biosystems, Foster City, CA).

RESULTS
Splicing and Cleavage Activity of the Mth RIR1 Intein-The splicing activity of the Mth RIR1 intein with its 5 native N-and C-extein residues was investigated by expressing it as an inframe fusion between E. coli maltose-binding protein (15) and the chitin-binding domain (16) from Bacillus circulans. In this protein context splicing products were detected (Fig. 2, lane 1), although the majority of the protein remained in the precursor form (M-R-B). Splicing proficiency was increased by mutating the Pro Ϫ1 to an Ala (Fig. 2, lane 3). Furthermore, the Pro Ϫ1 3 Ala or Pro Ϫ1 3 Gly mutants also displayed cleavage at the Nand C-terminal junctions of the intein (Fig. 2, lanes 3 and 5). The identity of splicing and cleavage products were confirmed by Western blot analysis using anti-MBP and anti-CBD polyclonal antibodies (data not shown).
The cleavage and/or splicing activity of the M-R-B precursor 3 R. Chong, unpublished data.
FIG. 1. Mth RIR1 intein amino acid sequence. Amino acid sequence of the Mth RIR1 intein with 5 native N-and C-extein residues (in bold type). Conserved regions of the splicing domains, N1, N2, N3, N4, C1, and C2 (22), are underlined and enclosed by vertical bars. The N-extein residue adjacent to the first amino acid of the intein is labeled Ϫ1 and numbering proceeds toward the N terminus of the protein (i.e. N Ϫ2 P Ϫ1 -intein). The intein residues are numbered sequentially starting with the N-terminal amino acid (C ϩ1 ). C-extein amino acids are numbered beginning with the residue immediately following the intein (i.e. intein-C ϩ1 G ϩ2 ). Intein-mediated Protein Ligation 3924 was more proficient when protein synthesis was induced at 15°C than when the induction temperature was raised to 37°C (Fig. 2). Replacement of Pro Ϫ1 with a Gly and Cys 1 with a Ser resulted in a double mutant, M-R-B (Pro Ϫ1 3 Gly/Cys 1 3 Ser), which showed only in vivo C-terminal cleavage activity when protein synthesis was induced at 15°C but not at 37°C (Fig. 2,  lanes 7 and 8). Another double mutant, M-R-B (Pro Ϫ1 3 Gly/ Cys 1 3 Ala) displayed slow cleavage, even at 15°C, which allowed the accumulation of substantial amounts of the precursor protein (data not shown) and showed potential for use as a C-terminal cleavage construct for protein purification.
Purification Using C-and N-terminal Cleavage Activity-The C-and N-terminal cleavage constructs of the Mth RIR1 intein were used to purify T4 DNA ligase or thioredoxin with an N-terminal cysteine or MBP with a C-terminal thioester. Two C-terminal cleavage constructs, pBRL-A and pBRT (Fig. 3, data not shown for pBRT), resulted in the isolation of 4 -6 mg/liter cell culture and 5-10 mg/liter cell culture of T4 DNA ligase and thioredoxin, respectively. These proteins possessed N-terminal cysteine residues based on amino acid sequencing following the ligation reaction (see below under "Intein-mediated Protein Ligation").
Conversely, an intein with only N-terminal cleavage activity was generated by changing Pro Ϫ1 to Gly and the C-terminal Asn 134 to an Ala creating M-R-B (Pro Ϫ1 3 Gly, Cys 1 3 Ser). N-terminal cleavage products were detected when protein synthesis was induced at both 15 and 37°C (Fig. 2, lanes 9 and 10). However, more precursor accumulated at the higher induction temperature. The remaining precursor protein could undergo thiol-mediated cleavage with reagents such as dithiothreitol or MESNA and could be used to purify thioester-tagged proteins as described previously (Fig. 3 and Refs. 11 and 12).
Intein-mediated Protein Ligation-IPL reactions consisted of mixing freshly purified MBP with T4 DNA ligase or thioredoxin ( Fig. 4 and "Experimental Procedures"). Ligation was monitored by the appearance of an extra band on SDS-PAGE ( Fig. 3 and data not shown for thioredoxin) corresponding to the predicted molecular weight of the ligation product. Typical ligation efficiencies ranged from 20 -60%.
A factor Xa site in MBP that exists 5 amino acids N-terminal from the site of fusion (17) allowed amino acid sequencing through the ligation junction (see "Experimental Procedures"). The sequence obtained was NH 2 -TLEGCGEQPTGXLK-COOH, which matched the last 4 residues of MBP (TLEG) followed by a linker sequence (CGEQPTG) and the start of T4 DNA ligase (ILK). During amino acid sequencing, the cycle expected to yield an isoleucine did not have a strong enough signal to assign it to a specific residue, so it was represented as an X. The cysteine was identified as the acrylamide alkylation product.

DISCUSSION
The C-terminal cleavage activity of the mutated Mth RIR1 intein advanced IPL technology by providing a means to isolate proteins possessing an N-terminal cysteine to act as substrates in the in vitro fusion of large, bacterially expressed proteins. Initially, an intein that cleaves in vivo was tested for the ability to generate a protein with an N-terminal cysteine. However, the side chain of the N-terminal cysteine residue appeared to be modified in vivo by an unidentified pathway (data not shown). Although this problem could be circumvented using a protease to cut on the N-terminal side of a cysteine residue, concern over nonspecific proteolysis and the need to remove the protease after cleavage limited its usefulness. Interestingly, C-terminal cleavage using the Mth RIR1 intein appeared to protect the cysteine residue until it could be released in vitro. A recently developed Sce VMA intein with thiol-inducible C-terminal cleavage activity could not be used because it would undergo splicing instead of cleavage with an N-terminal cysteine on the target protein (18).
The concentration dependence of the ligation reaction was probably due to the need to increase the ligation reaction rate to effectively compete with thioester hydrolysis, which would prevent ligation. Protein fusion occurred at 20 -40% efficiency at 6.5-8.5 mg/ml of each reactant (data not shown), although greater extents of reaction (50 -60%, Fig. 3) were observed at higher protein concentrations. Many proteins can exist in solution at the lower concentrations, indicating that IPL will be useful for a wide range of applications. However, these conditions are problematic for some proteins, and future work may determine procedures that will lower this concentration requirement.
N-terminal amino acid sequencing through the ligation junction demonstrated that the two proteins were fused tail-to-head in a continuous polypeptide chain and had not fused to form an unusual branched structure. Furthermore, these data reinforce past studies reporting that a native peptide bond is formed using native chemical ligation chemistry (10) because the polypeptide sequencing reaction requires a peptide bond between amino acid residues.
Previously, studies with the Sce VMA intein reported that splicing was inhibited when a proline replaced the naturally occurring glycine at the Ϫ1 position (8). However, the Mth RIR1 intein has a naturally occurring proline at this position and was thought to be able to splice with this unique amino acid. The low splicing activity of the Mth RIR1 intein shows that it is capable of splicing but that it may not be folding properly when expressed in E. coli. Alternatively, this intein may require more native extein sequence than provided or require a cofactor such as a prolyl isomerase to promote proficient splicing activity.
The Mth RIR1 intein primary sequence was compared with the amino acid sequence and crystal structure of another miniintein, the Mxe GyrA intein (19,20). Most of the amino acids that form two ␣-helices and a disordered region in the Mxe GyrA intein appeared to be missing in the Mth RIR1 intein. The ␣-helical and disordered regions were previously found not to be required for splicing of the Ssp DnaB intein (21), and this portion of the protein may only serve as a linker. The small size of this region in the Mth RIR1 intein may decrease its stability and may account for some of its induction temperature-dependent activity.
The mechanism of the induction temperature-dependent splicing and cleavage activity has yet to be determined, but it may be due to reactions occurring at the C terminus of the intein. C-terminal cleavage was more severely affected by induction temperature than N-terminal cleavage activity (Fig. 2). It is also possible that the Mth RIR1 intein could be misfolding in E. coli when induced at the higher temperature, an interesting possibility considering that M. thermoautotrophicum is a thermophilic bacteria.
In conclusion, this report demonstrated that the smallest known intein, the Mth RIR1 intein, along with its 5 native extein residues was capable of splicing. Furthermore, this in-tein was capable of generating both thioester-tagged proteins and proteins with an N-terminal cysteine. The latter was of particular importance because it facilitated the next major advance in intein-mediated protein ligation, which is the fusion of two large, bacterially expressed proteins. This paves the way for greater freedom in the labeling of proteins for NMR analysis, the isolation of cytotoxic proteins, and in the future the controlled fusion of three bacterially expressed proteins.