Protein Splicing Involving the Saccharomyces cerevisiae VMA Intein THE STEPS IN THE SPLICING PATHWAY, SIDE REACTIONS LEADING TO PROTEIN CLEAVAGE, AND ESTABLISHMENT OF AN IN VITRO SPLICING SYSTEM*

Protein splicing involves the excision of an internal protein segment, the intein, from a precursor protein and the concomitant ligation of the flanking N- and C- terminal regions. It occurs in mesophilic bacteria, yeast, and thermophilic archaea. The ability to control protein splicing of a thermophilic intein by temperature and pH in a foreign protein context facilitated the study of the mechanism of protein splicing in thermophiles. On the other hand, no direct studies have been done on the mechanism of protein splicing in mesophiles. We examined the splicing of a chimeric protein containing the intein of the vacuolar ATPase subunit (VMA) of Saccharomyces cerevisiae that involves cysteines rather than serines at the reaction center. The steps in the splicing process were deduced by analyzing intermediates and side products that accumulated as a result of amino acid substitutions and were found to be analogous to those occurring in thermophiles. Moreover, appropriate amino acid replacements allowed us to develop the first mesophilic in vitro protein splicing system as well as strategies for modulating the rate of protein splicing and for converting the splicing reaction to an efficient protein cleavage reaction at either splice junction. Protein splicing is a novel mode of gene expression that has been described in mesophilic bacteria and yeast and in extremely thermophilic archaea (1–6). It is a process in which a single gene directs the synthesis of two separate proteins by the precise excision of an internal protein segment, the intein, from a precursor protein and the concomitant 4.6 (cid:51) 250 mm) at room temperature and a flow rate of 1 ml/min, with linear gradients of solvent A (0.1% aqueous trifluoroacetic acid) and solvent B (0.1% trifluoroacetic acid in acetonitrile). Amino acid analysis employed a Beckman model 7300 high performance analyzer with a System Gold data analysis module after vapor phase hydrolysis with HCl using a Waters PicoTag work station. High resolution mass spectra were recorded on a Jeol JMS-SX102 spectrometer at the Department of Chemistry, Harvard University. The colorimetric determination of hy- droxamic acids was done by the method of Seifter et al. (23) as described earlier (18). A standard curve with GTLAG-hydroxamate showed a linear relationship between hydroxamic acid concentration and absorb-ance at 520 nm from 5 to 50 nmol of GTLAG-hydroxamate, with 50 nmol yielding A 520 of 1.4. N-terminal sequences were determined by sequen- tial Edman degradation with an Applied Biosystems 470A protein se-quencer on proteins electroblotted on ProBlott polyvinylidene difluoride membranes as described previously (14). Protein concentrations were estimated by the method of Bradford (24).

Protein splicing involves the excision of an internal protein segment, the intein, from a precursor protein and the concomitant ligation of the flanking N-and Cterminal regions. It occurs in mesophilic bacteria, yeast, and thermophilic archaea. The ability to control protein splicing of a thermophilic intein by temperature and pH in a foreign protein context facilitated the study of the mechanism of protein splicing in thermophiles. On the other hand, no direct studies have been done on the mechanism of protein splicing in mesophiles. We examined the splicing of a chimeric protein containing the intein of the vacuolar ATPase subunit (VMA) of Saccharomyces cerevisiae that involves cysteines rather than serines at the reaction center. The steps in the splicing process were deduced by analyzing intermediates and side products that accumulated as a result of amino acid substitutions and were found to be analogous to those occurring in thermophiles. Moreover, appropriate amino acid replacements allowed us to develop the first mesophilic in vitro protein splicing system as well as strategies for modulating the rate of protein splicing and for converting the splicing reaction to an efficient protein cleavage reaction at either splice junction.
Protein splicing is a novel mode of gene expression that has been described in mesophilic bacteria and yeast and in extremely thermophilic archaea (1)(2)(3)(4)(5)(6). It is a process in which a single gene directs the synthesis of two separate proteins by the precise excision of an internal protein segment, the intein, from a precursor protein and the concomitant ligation of the flanking N-and C-terminal regions, the exteins, to yield two new proteins (7). Some of the excised inteins are homing endonucleases that catalyze lateral transfer of their DNA coding sequences by an intein homing mechanism (8 -11), whereas the ligated exteins are usually enzymes with a specific cellular function.
Efficient protein splicing also occurs when inteins are transferred into heterologous proteins, suggesting that all structural and catalytic elements needed for the splicing reaction reside in the inteins plus the first C-extein residue (4,(12)(13)(14). The protein splicing function of inteins is independent of their homing endonuclease activity (15) and depends on highly conserved amino acid residues at both splice junctions. A hydroxyl-or thiol-containing residue (Ser, Thr, or Cys) is always present at the positions that immediately follow the two splice junctions, and the sequence His-Asn is invariant at the intein C terminus. Substitution of any of these conserved residues retards or abolishes protein splicing (2,12,13,15).
The fact that the thermophilic archaeal inteins in a foreign protein context undergo efficient splicing only at elevated temperatures (25-65°C) opened the way for the development of an in vitro system to study the mechanism of protein splicing (14). We constructed a fusion protein containing the intein from the thermostable DNA polymerase of Pyroccocus sp. GB-D, which could be expressed in Escherichia coli at low temperatures (12-15°C), purified as an unspliced precursor protein, and then allowed to undergo splicing at elevated temperatures. The following multi-step reaction mechanism could be deduced from these in vitro studies. Step 1, formation of an ester intermediate by an N-O acyl rearrangement at the conserved Ser residue at the upstream splice junction (16,39).
Step 2, formation of a branched intermediate by transesterification involving attack by the hydroxyl side chain of the conserved Ser residue at the downstream splice junction on the ester formed in Step 1 (14,17).
Step 3, excision of the intein by peptide bond cleavage coupled to succinimide formation involving the conserved Asn residue at the downstream splice junction (17,18). Step 4, spontaneous O-N acyl rearrangement of the transitory ligation product from an ester to the more stable amide (Fig. 1).
Even though our studies have provided a relatively complete picture of the steps involved in the splicing of an intein from an extremely thermophilic archaeon, a question has been raised concerning whether this mechanism also applies to protein splicing in mesophilic bacteria and yeast (19). This is because protein splicing in the thermophilic archaea and the mesophilic microbes differs in two major respects, which may possibly reflect mechanistic differences. (i) The conserved amino acid adjacent to the upstream splice junction usually has a hydroxyl side chain (Ser or Thr) in archaea but a thiol side chain (Cys) in mesophilic organisms. (ii) The inteins from mesophilic organisms splice more rapidly at low temperatures than those from archaea in foreign contexts, indeed, at such high rates that no intermediates have been detected.
To determine whether splicing of the Cys-bounded mesophilic Sce VMA intein occurs by the same mechanism as the Ser-bounded intein from a thermophile, we developed an in vitro splicing system based on the VMA intein from the 69-kDa vacuolar ATPase subunit of Saccharomyces cerevisiae (1). Although the rapid splicing in this system precluded the isolation of the unprocessed precursor, we succeeded in defining the key intermediates by arresting the splicing process at specific steps, either by mutationally altering critical amino acid residues or by chemical trapping. Moreover, in one case we were able to attenuate the splicing process by amino acid replace-ments so that the unspliced precursor could be isolated and its splicing studied in vitro. Our results show that the series of reactions outlined in Fig. 1 adequately describes protein splicing both in thermophilic and mesophilic organisms. In addition, the understanding of the specific roles of the key amino acid residues in the splicing process afforded by our studies led to strategies for effecting efficient protein cleavage at either splice junction.

EXPERIMENTAL PROCEDURES
Numbering of Residues in MYT-Amino acid numbers refer to the position in the S. cerevisiae VMA intein, with Cys-1 being the first residue in the intein and Asn-454 the last. The numbering continues sequentially into the C-extein (T), 1 beginning with Cys-455. Consecutive negative numbers are used to designate residues in the N-extein (M), with the Gly residue at the splice junction being Gly-(Ϫ1). This numbering system is illustrated in Fig. 2A.
Construction of MYT-The Sce VMA intein with the first C-extein codon (Cys) was synthesized by the polymerase chain reaction from pT7VDE (gift of Dr. Frederick S. Gimble) containing a portion of the VMA1 gene including the entire PI-SceI sequence (8) using the primers 5Ј-GCG CTC GAG GGG TGC TTT GCC AAG GGT ACC AAT-3Ј and 5Ј-CC TCC GCA ATT ATG GAC GAC AAC CTG GT-3Ј. Polymerase chain reaction mixtures (50 l) contained Vent DNA polymerase buffer (New England Biolabs), 4 mM MgSO 4 , 400 M each of the 4 dNTPs, 1 M of each primer, 50 ng of pT7VDE DNA, and 0.5 units of Vent DNA polymerase. Amplification was carried out for 20 cycles using a Perkin-Elmer thermal cycler at 94°C for 30 s, 50°C for 30 s, and 72°C for 5 min. The product was digested with XhoI and ligated with pMIP21 (17) that had been digested with XhoI and StuI, to yield pMYP1.
To introduce restriction sites for cassette replacement of the downstream splice junction, a XhoI-PstI fragment from pMYP1, containing the PI-SceI sequence, was first subcloned into the multicloning site of LITMUS 29 (New England Biolabs), to create pLitYP. Single-stranded DNA was generated from pLitYP and used as the template for mutagenesis by the method of Kunkel (20). The mutagenic primers 5Ј-GAA TGC GGA ATT CAG GCC TCC GCA-3Ј and 5Ј-ATG GAC GAC AAC CTG GGA TCC AAG CAA AAA CTG ATG ATC-3Ј created two unique restriction sites, BamHI and EcoRI, flanking the C-terminal splice junction of the VMA intein, with Ala-447 and Asn-448 of the intein being replaced by Gly and Ser, respectively. The XhoI-PstI fragment containing these mutations was then used to replace the corresponding segment of pMYP1 to yield pMYP2.
The E. coli thioredoxin gene was synthesized by the polymerase chain reaction from pETrx (provided by F. Feng and M.-Q. Xu), which contains the thioredoxin coding sequence inserted at the XmnI site in pMAL-c2 (New England Biolabs), using the primers 5Ј-GGC CTG AAT TCC ATG AGC GAT AAA ATT ATT CAC-3Ј and 5Ј-G TCG ATC TGC AGG TCA TTA CGC CAG GTT AGC GTC GAG-3Ј. Polymerase chain reaction mixtures (100 l) contained Vent DNA polymerase buffer (New England Biolabs), 3 mM MgSO 4 , 300 M each of the 4 dNTPs, 10 M of each primer, 50 ng of pETrx DNA, and 0.5 units of Vent DNA polymerase. Amplification was carried out for 22 cycles using a Perkin-Elmer thermal cycler at 94°C for 30 s, 50°C for 1 min, and 72°C for 1 min. The product was digested with EcoRI and PstI and ligated with pMYP2 digested with EcoRI and PstI, to yield pMYT1. Unless otherwise stated, all enzymes and plasmids used were the products of New England Biolabs, Inc.
Mutagenesis of MYT-pMYT1 contains an XhoI site and a KpnI site flanking the N-terminal splice junction and a BamHI site and an EcoRI site flanking the C-terminal splice junction. These unique sites allow convenient mutagenesis by cassette substitution. pMYT1 was digested with XhoI and KpnI and then ligated with the complementary oligomers, 5Ј-TC GAG GGA TCC TTT GCC AAG GGT AC-3Ј and 5Ј-C CTT GGC AAA GGA TCC C-3Ј, resulting in pMYT1(C1S), in which Ser replaces Cys-1. pMYT1 was digested with BamI and EcoRI and then ligated with the complementary oligomers, 5Ј-GA TCC CAG GTT GTC GTC CAT GCA TGC GGA GGC CTG-3Ј and 5Ј-A ATT CAG GCC TCC GCA TGC ATG GAC GAC AAC CTG G-3Ј, to create pMYT1(N454A). The same pMYT1 digest was also used with different pairs of complementary oligomers to create pMYT1(N454A/C455A), pMYT1(H453L/ C455S), pMYT1(N454A/C455S), respectively. The double mutant pMYT1(C1S/N454A) was constructed by using the XhoI-BamHI fragment from pMYT1(C1S) to replace the corresponding fragment in pMYT1(N454A). The triple mutant pMYT1(C1S/N454A/C455S) was made by using the XhoI-BamHI fragment from pMYT1(C1S) to replace the corresponding fragment in pMYT1(N454A/C455S). pMYT1(L446M) was made by the Kunkel mutagenesis method as described above, using the mutagenic primer 5Ј-ATG GAC GAC AAC CTG GTT GGC CAT CAA AAA CTG ATG ATC-3Ј, yielding the sequence MANQVVVHN at the intein C terminus. pMYT1(C1S/L446M) was constructed by combining the mutations C1S and L446M from pMYT1(C1S) and pMYT1(L446M). pMYT1(R(Ϫ6)M/E(Ϫ2)A) was made by cassette substitution so as to introduce a Met residue at position Ϫ6 and an Ala at position Ϫ2 of the C terminus of the N-extein, yielding MGTLAG as the extein sequence flanking Cys-1. This mutation was combined with the N454A mutation to yield pMYT1(R(Ϫ6)M/N454A).
MYT Expression and Purification on Amylose Columns-The proteins produced by pMYT1 or its mutant derivatives are referred to as MYT, which stands for maltose binding protein-yeast intein-thioredoxin fusion. E. coli strain ER2267 (provided by Elisabeth Raleigh, New England Biolabs), harboring pMYT1 or its mutant derivatives, was grown at 37°C in 1 liter of LB medium containing 100 g of ampicillin/ml to an A 600 of 0.5-0.8. The culture was then transferred to a 15°C air shaker and induced with 0.4 mM isopropyl-␤-D-thiogalactoside for 16 h. The cells were harvested and disrupted by sonication in 50 ml of phosphate column buffer (20 mM sodium phosphate, pH 6.5, 0.5 M NaCl) or Hepes column buffer (20 mM Hepes, pH 7.6, 0.5 M NaCl). After centrifugation at 25,000 ϫ g for 30 min, the crude supernatant was passed through an amylose column (New England Biolabs; 2 ml of resin per g of cells), and the column was then washed with one of the above column buffers until the protein content of the eluant reached a minimum. The fusion proteins were eluted from the column by the same column buffer supplemented with 10 mM maltose. Approximately 10 -30 mg of fusion protein was obtained from 1 liter of culture. The fusion proteins were analyzed by SDS-PAGE, followed by Coomassie Blue staining and Western blot analysis using antibodies raised against each domain (maltose binding protein, VMA intein, or thioredoxin), as described below.
In Vitro Splicing of MYT(H453L/C455S)-E. coli strain ER2267 harboring pMYT1(H453L/C455S) was induced by 0.4 mM isopropyl-␤-D-thiogalactoside at 15°C for 16 h, and the proteins were purified on amylose resin in 20 mM Hepes, pH 7.6, 0.5 M NaCl. The purified protein (1 mg/ml) was immediately subjected to in vitro splicing by incubating samples in 20 mM Hepes, pH 7.6, 0.5 M NaCl at 25°C for 16 h. Experiments were also done at other pH values (pH 6 -8) and temperatures (4 or 25°C) and with 40 mM dithiothreitol. Splicing was monitored by SDS-PAGE. Amylose-purified samples from cells induced at 30°C for 3 h contained less unspliced precursor.
Chemical Cleavage of the MYT Mutant Proteins-MYT fusion proteins (1.0 -1.5 mg/ml), freshly purified on amylose resins either in phosphate buffer at pH 6.5 or in Hepes buffer at pH 7.6, were incubated in the corresponding column buffer at 25 or 4°C with or without hydroxylamine, cysteine, or dithiothreitol. At appropriate times, samples (40 l) were removed, mixed with 20 l of 3 ϫ SDS Sample Buffer (New England Biolabs), boiled briefly, and analyzed by SDS-PAGE and Western blotting as described below. The effect of pH on the branched intermediate derived from MYT(N545A/C455S) was studied in the same way, except that the purified fusion protein was kept in phosphate buffer, pH 6.5, for 1 week at 4°C to allow conversion of the precursor to the branched intermediate.
To For analysis of the C terminus of the N-extein after cleavage with hydroxylamine, samples (1.5 mg/ml) of amylose-purified MYT(R(Ϫ6)M/ N454A) were treated at 25°C for 6 h with 250 mM hydroxylamine in 20 mM sodium phosphate, pH 6.5, and 0.5 M NaCl for subsequent analysis of the C-terminal peptide of the N-extein (maltose binding protein or M) after CNBr cleavage as described below.
Peptide Synthesis and Cyanogen Bromide Treatment-Peptides were synthesized on an Applied Biosystems model 431 peptide synthesizer as FIG. 1. Proposed mechanism for protein splicing involving the Sce VMA intein. The proposed mechanism is supported by the data presented in this paper. The mechanism of protein splicing in thermophilic arachaea proceeds by four analogous chemical steps, except that the Cys residues shown in the diagram are usually replaced by Ser, so that Steps 1 and 4 are N-O and O-N acyl shifts, respectively (14, 16 -18). The succinimide derivative is relatively stable and can be isolated together with its hydrolysis product as shown by the data of Xu et al. (17) and this paper. Its hydrolysis leads to a mixture of C-terminal asparagine (as shown) as well as isoasparagine. The dotted arrows on the right indicate the side reactions that can occur as a result of amino acid substitutions at the splice junctions. See text for detailed explanation. described earlier (17). ANQVVVHN and GTLAG were prepared by CNBr cleavage of 2,4-dinitrophenyl-LMANQVVVHN and 2,4-dinitrophenyl-LMGTLAG, respectively, as described below. The peptide methyl esters ANQVVVHN-OMe and GTLAG-OMe were prepared from the corresponding peptides as described previously (18). ANQVVVHNϾ was obtained from ANQVVVHN-OMe by heating at pH 5.5 and 90°C for 1 h. GTLAG hydroxamic acid was prepared from GTLAG-OMe by heating at 65°C with 2 M NH 2 OH, pH 9.0, for 30 min, essentially as described by Kwong and Harris (21). The reaction products were then repeatedly purified by HPLC to remove traces of NH 2 OH that would interfere with the colorimetric assay for hydroxamic acids described below.
For cyanogen bromide treatment, peptide or protein samples (1 mg or less) were first precipitated at 0°C in 10% trichloroacetic acid, and the precipitates were washed twice with ethanol, dried in a vacuum, dissolved in 1 ml of 70% formic acid, and treated with 20 mg of CNBr under nitrogen at 25°C in the dark for 15-20 h, followed by evaporation in a vacuum (22). The residues were redissolved in a small volume of water and purified by reverse phase HPLC.
Analytical Methods-SDS-PAGE was performed in 12% Tris glycine gels (Novex, San Diego, CA), followed by staining with Coomassie Blue. The gels were then blotted onto nitrocellulose membranes and analyzed by probing with polyclonal antibodies against maltose binding protein (New England Biolabs), the Sce VMA intein (gift of Dr. F. S. Gimble), and thioredoxin (American Diagnostica Inc.), as described by Perler et al. (5). The separation of peptides by HPLC employed a Rainin system with an analytical C-18 reverse phase column (Vydac; 5-mm pores, 4.6 ϫ 250 mm) at room temperature and a flow rate of 1 ml/min, with linear gradients of solvent A (0.1% aqueous trifluoroacetic acid) and solvent B (0.1% trifluoroacetic acid in acetonitrile). Amino acid analysis employed a Beckman model 7300 high performance analyzer with a System Gold data analysis module after vapor phase hydrolysis with HCl using a Waters PicoTag work station. High resolution mass spectra were recorded on a Jeol JMS-SX102 spectrometer at the Department of Chemistry, Harvard University. The colorimetric determination of hydroxamic acids was done by the method of Seifter et al. (23) as described earlier (18). A standard curve with GTLAG-hydroxamate showed a linear relationship between hydroxamic acid concentration and absorbance at 520 nm from 5 to 50 nmol of GTLAG-hydroxamate, with 50 nmol yielding A 520 of 1.4. N-terminal sequences were determined by sequential Edman degradation with an Applied Biosystems 470A protein sequencer on proteins electroblotted on ProBlott polyvinylidene difluoride membranes as described previously (14). Protein concentrations were estimated by the method of Bradford (24).

Construction of a Fusion Protein
Carrying the Sce VMA Intein-A gene carrying the coding sequence for the VMA intein from the 69-kDa vacuolar ATPase subunit of S. cerevisiae (1) (Y) was inserted in-frame between the genes for the E. coli maltose-binding protein as the N-extein (M) and E. coli thioredoxin as the C-extein (T), to yield a continuous open reading frame encoding a fusion protein termed MYT ( Fig. 2A). Plasmid pMYT1, which carries the intact MYT coding region, was used to transform E. coli, and expression of MYT was induced at 37°C with isopropyl-␤-D-thiogalactoside. When the products containing the maltose binding protein were isolated by chromatography on an amylose column and examined by SDS-PAGE, the major component detected was a 54-kDa protein, the size expected for the ligated exteins, MT (Fig. 2B, lane 2), indicating that efficient splicing had occurred in vivo. Similar results were obtained when induction was performed at 12°C (data not shown), suggesting that the splicing of MYT occurs efficiently even at low temperatures.
Effect of Amino Acid Substitutions on the Splicing of a Fusion Protein Carrying the Sce VMA Intein-To arrest the splicing of MYT at specific stages, the conserved amino acid residues at the splice junctions (Cys-1, Asn-454, and Cys-455) were replaced with other amino acids as shown in Fig. 2A, and the mutagenized plasmids were expressed in E. coli. The proteins were purified by chromatography on amylose columns and analyzed for apparent molecular mass by SDS-PAGE (Fig. 2B) and for composition by Western blot analysis using antibodies raised against the maltose binding protein, the VMA intein, and thioredoxin (data not shown). When Asn-454 was replaced by Ala (N454A), a 104-kDa polypeptide accumulated whose size was consistent with that predicted for the unspliced primary translation product, MYT (Fig. 2B, lane 5). When either Cys residue was replaced by Ser (C1S or C455S), no protein splicing was observed and, instead of MT, a larger polypeptide accumulated that has a molecular mass of 92 kDa, the predicted size of MY (Fig. 2B, lanes 3 and 4), suggesting that instead of splicing the protein underwent cleavage at the C-terminal splice junction. The cleaved thioredoxin domain (T) was not purified on an amylose column due to the absence of the maltose binding domain. On the other hand, replacement of Asn-454 plus one Cys at either splice junction prevented both splicing and Cterminal cleavage. Expression of the double mutant C1S/ N454A led to the accumulation of the 104-kDa precursor, MYT (Fig. 2B, lane 6), whereas the double mutant N454A/C455S produced the 104-kDa precursor plus a more slowly migrating protein species, MYT* (Fig. 2B, lane 7), whose identification is described in a later section. The triple mutant C1S/N454A/ C455S produced only the 104-kDa precursor, MYT (Fig. 2B,  lane 8).
Evidence for the Cyclization of Asn-454 to Yield C-terminal Aminosuccinimide-To explore the mechanism of cleavage at the C-terminal splice junction, either in the course of protein splicing or in the C1S mutant, where C-terminal cleavage rather than splicing occurs (Fig. 2B, lane 3), we employed a strategy similar to that used earlier for the identification of the C terminus of the excised archaeal Psp pol intein-1 (17,18). This involved replacing Leu-446 with Met ( Fig. 2A) so as to allow easy analysis of the C terminus of the intein after CNBr cleavage to release the terminal octapeptide ANQVVVHN. The CNBr cleavage products derived from the intein were compared with the synthetic model peptides ANQVVVHN and ANQVVVHNϾ, the latter having a C-terminal aminosuccinimide (NϾ) residue. HPLC fractionation of the CNBr peptides derived either from the intein (Y) excised by splicing of MYT or of the cleavage product (MY) from the MYT/C1S mutant yielded two approximately equal peaks with elution times corresponding to those of the ANQVVVHN and ANQVVVHNϾ standards. After repeated cycles of purification, the peaks were subjected to amino acid analysis and high resolution mass spectrometry, which confirmed that the C-terminal peptides derived from Y and MY consisted of ANQVVVHN and ANQV-VVHNϾ (Table I). We can therefore conclude that cleavage at the C-terminal splice junction, either as a result of protein splicing or in the abortive cleavage reaction that occurred when splicing was blocked by the C1S mutation, is coupled to the cyclization of the terminal Asn residue to aminosuccinimide.
Evidence for an N-S Acyl Rearrangement at the Upstream Splice Junction-If the initial step in protein splicing in yeast involved an acyl rearrangement at the upstream splice junction analogous to that seen in an intein from an extreme thermophile, one should be able to observe a thioester intermediate involving the sulfhydryl group of Cys-1 of the intein. Even though it would be difficult to detect such an intermediate in fusion proteins containing a normal intein, given the rapid rate of splicing, a thioester intermediate should be detectable in proteins containing an intein with mutations at the downstream splice junction that interfere with subsequent splicing steps, such as N454A or C455A. Accordingly, we purified the unspliced precursor protein, MYT, containing inteins with either the single mutation N454A or the double mutation N454A/ C455A, and treated them with hydroxylamine or thiols at pH 6.5 and 25°C, nucleophiles known to cleave thioesters under such mild conditions (25,26). Cleavage of the N454A precursor at the upstream splice junction to yield M and YT occurred with a t1 ⁄2 of 30 min at 25°C in the presence of 0.2 M hydroxylamine and was essentially complete in 16 h at 4°C in the presence of 50 mM cysteine, whereas no cleavage occurred in the absence of either nucleophile (Fig. 3A). Similar results were obtained with MYT derived from the double mutant, N454A/C455A (Fig. 3B). On the other hand, replacement of Cys-1 to yield the double mutant, C1S/N454A, completely prevented hydroxylamine-or thiol-induced cleavage (data not shown).
To determine whether nucleophilic attack by hydroxylamine occurred precisely at the upstream splice junction, we used a mutant protein in which Arg at position Ϫ6 was replaced by Met ( Fig. 2A) so as to allow characterization of the C-terminal pentapeptide of the maltose binding protein after cleavage with CNBr. In addition, Glu at position Ϫ2 was changed to Ala to facilitate the chemical synthesis of the corresponding peptide derivatives; neither amino acid substitution affected protein cleavage or splicing. On separation of the CNBr peptides obtained from hydroxylamine-treated MYT from the triple mutant, R(Ϫ6)M/E(Ϫ2)A/N454A, by reverse-phase HPLC, a single hydroxamate-containing component was obtained that eluted at the same position as synthetic GTLAG-hydroxamate (10.6 min) and on mass spectrometric analysis yielded M ϩ H ϩ of 433.2399, in close agreement with that obtained with synthetic GTLAG-hydroxamate (433.2393) and the predicted value of 433.2411. This observation supports the conclusion that cleavage by hydroxylamine involved attack on the Gly-Cys bond at the upstream splice junctions.
To ascertain that cleavage of MYT by Cys involved nucleophilic displacement at a thioester bond rather than reduction of a disulfide bond involving Cys-1, cysteine-induced cleavage of MYT from the N454A/C455A double mutant was carried out with 35 S-labeled cysteine, and the distribution of radioactivity in the cleavage products was determined by radioautography. In the presence of 10 mM Cys to reduce nonspecific binding, the radioactivity was exclusively associated with M, the result expected if cleavage involved the nucleophilic attack of cysteine on a thioester bond linking M with YT (Fig. 3C). This conclusion was supported by C-terminal analysis of M released in a parallel experiment, which showed that 90% of the polypeptide chains were terminated by Cys, the rest by Gly (data not shown).
To determine whether the thioester intermediate accumulated to a significant extent or whether its reaction with hydroxylamine shifted a relatively unfavorable equilibrium, the N454A precursor was denatured with 6 M guanidinium chloride, pH 6.5, followed by treatment with hydroxylamine under the conditions described in Fig. 3. Only trace amounts of cleavage products were found under these conditions, suggesting that the N-S equilibrium may strongly favor native peptide bond formation (results not shown).
Evidence for the Formation of a Branched Intermediate-For reasons to be discussed later, the detection of a branched intermediate in the splicing of proteins containing the mesophilic Sce VMA intein, analogous to that seen with a thermophilic intein (14,17), would be relatively difficult. This problem was addressed by using a mutant protein in which Asn-454, which is required for the final cleavage reaction (see above), was replaced by Ala and Cys-455, the putative branch point, by Ser.
If a branched intermediate were formed, the latter substitution would cause the branch to be linked through an oxygen ester, which would be more stable in vivo and during purification than a thioester bond. Indeed, SDS-PAGE of the maltose binding protein-containing products derived from the MYT mutant, N454A, showed the predominant species to be the linear precursor MYT, whereas the MYT double mutant, N454A/C455S, yielded in addition a more slowly migrating polypeptide, MYT* (Fig. 2B). Western blot analysis showed that both MYT and MYT* reacted with sera specific to the maltose binding protein, the yeast intein, and the thioredoxin (data not shown). Aminoterminal sequencing of the more slowly migrating product showed the release of roughly equivalent amounts of two amino acids at each sequencing cycle, one corresponding to the Nterminal sequence of the maltose binding protein, the other to that of the VMA intein, suggesting that MYT* is a branched protein with two N-terminal polypeptide chains (Table II).
When the mixture of proteins produced by the N454A/C455S double mutant was kept for prolonged periods at pH 6.5 and 4°C, the amount of MYT* gradually increased to 30% of the total protein. The amount of linear precursor declined to yield MYT* and the cleavage products M and YT (Fig. 4A, lanes 1  and 2). The stability of MYT* was examined by incubating the material (previously equilibrated at 4°C and pH 6.5) at 37°C both at low and high pH. The amount of linear precursor was not significantly affected at either pH, whereas the branched intermediate MYT* was completely degraded at pH 9.5 (but not at pH 5.5) with a corresponding increase of YT and M (Fig. 4A). In contrast, treatment with neutral hydroxylamine led to the N-terminal cleavage of the linear precursor, MYT, to yield M and YT, but had little effect on MYT* (Fig. 4B). These observations suggested that the maltose binding protein M was attached to the branched intermediate, MYT*, by a relatively alkali-labile and hydroxylamine-resistant bond, consistent with the properties of an oxygen ester.
In Vitro Splicing of a Mutant Protein and Detection of Intermediates-Although many of the mutant inteins described so far could undergo side reactions related to protein splicing, none were able to yield normal splicing products in significant yields. However, one fusion protein carrying the double amino acid substitution H453L/C455S yielded the linear precursor, MYT, and a more slowly migrating component with a mobility similar to that of the branched intermediate was observed (Fig.  5) when expressed in E. coli at 12°C. The precursors could be purified at 4°C, yet underwent nearly quantitative splicing to MT and Y when incubated overnight at 25°C (Fig. 5, lanes 2   FIG. 3. Nucleophile-  Lys (7), Phe (7) Lys Phe 3 Thr (20), Ala (15) Thr Ala 4 Glu (16), Lys (4) Glu Lys 5 Glu (17), Gly (16) Glu Gly 6 Gly (12), Thr (24) Gly Thr 7 Lys (10), Asn (11) Lys Asn 8 Leu (22), Val (11) Leu Val 9 Val (18), Leu (19) Val Leu 10 Ile (22), Met (7) Ile Met 11 Trp (5), Ala (29) Trp Ala 12 Ile (26), Asp (15) Ile Asp 13 Asn (13), Gly (24) Asn Gly 14 Gly (29), Ser (13) Gly Ser 15 Asp (17), Ile (20) Asp Ile 16 Lys (16), Glu (10) Lys Glu a The first amino acid residue in cycle 1 was detected as the oxidized form of cysteine. and 3) or for 3 days at 4°C (data not shown), but treatment with dithiothreitol at 25°C promoted cleavage at the upstream splice junction to yield M and YT (Fig. 5, lane 4). The identities of the precursor, intermediate, and products were confirmed by Western blot analysis with antibodies specific for M, Y, and T (data not shown). This suggested that splicing of the H453L/ C455S intein was accompanied by the formation of the linear thioester and the branched intermediate, which occurred as dead-end products in the rearrangements of some of the nonsplicing mutant inteins in the earlier sections. DISCUSSION The results presented in this paper provide strong evidence that protein splicing involving an intein from yeast proceeds by the reaction pathway outlined in Fig. 1. It appears, therefore, that similar protein splicing mechanisms have evolved in me-sophilic eukaryotes and in hyperthermophilic archaea. Owing to the fact that protein splicing in mesophiles occurs so rapidly that intermediates cannot ordinarily be isolated, our dissection of the protein splicing mechanism had to rely on the use of mutants blocked in specific steps. The advantage of this type of approach was that it allowed us to demonstrate the first in vitro protein splicing for a mesophilic intein, which provided insights into the specific roles of the key amino acid residues in protein splicing and yielded strategies for the subversion of the splicing process into efficient protein cleavage reactions at either splice junction.

Roles of the Conserved Amino Acid Residues in the Protein Splicing Pathway
Earlier studies on protein splicing in mesophilic mycobacteria and yeast showed that three of the amino acid residues flanking the splice junctions are essential for the splicing process, the two Cys at the C-terminal side of both splice junctions and the Asn at the N-terminal side of the downstream splice junction. Replacement of any of these amino acid residues completely blocked the ability of the proteins to undergo splicing (2,12,13), an observation confirmed by our results with a fusion protein containing the VMA intein of S. cerevisiae between heterologous flanking regions (Fig. 2B, lanes 3-8). The earlier studies relied on single amino acid substitutions that completely suppressed the splicing reaction and therefore could not yield definitive insights into the mechanism of protein splicing, except for the observation that elimination of either Cys residue led to cleavage adjacent to the Asn residue at the downstream splice junction, suggesting that one of the steps in the protein splicing pathway is the hydrolysis of a peptide bond involving Asn (2,12,13).
In this paper, we have extended the mutation approach by studying the effect of multiple amino acid substitutions, chosen so as to arrest the splicing process at specific steps, and then identifying the intermediates in terms of the products of either spontaneous or chemically induced decomposition. This approach allowed us to define the sequence of steps outlined in Fig. 1. The first three steps in the splicing pathway, each of which depends critically on one of the three essential amino acid residues, are as follows.
Step 1, N-S acyl rearrangement involving Cys-1; step 2, transesterification involving Cys-455; and step 3, peptide cleavage coupled to succinimide formation involving Asn-454. The evidence in support of this conclusion is detailed in the next section.

The Pathway of Protein Splicing Involving the Yeast VMA Intein
Step 1. N-S Acyl Rearrangement-According to our working hypothesis, this reaction should not depend on Asn-454 and Cys-455; mutant proteins lacking these residues should consist of an equilibrium mixture of the polypeptide precursor, MYT, and a linear thioester intermediate. Although the position of the equilibrium will be far toward the polypeptide, the thioester can be detected by virtue of its high reactivity with hydroxylamine (16). The results in Fig. 3B show that treatment of MYT carrying the double mutation N454A/C455A with hydroxylamine or thiols led to rapid and complete cleavage at the upstream splice junction to yield M and YT. M produced by hydroxylamine-or cysteine-induced cleavage had a C-terminal Gly hydroxamic acid or Cys residue ( Fig. 3C; text), respectively, confirming that the bond cleaved by these nucleophiles was that between the C terminus of the N-extein (Gly) and the N terminus of the intein (Cys-1). The critical role of Cys-1 in this process was supported by the observation that MYT lacking this residue (C1S/N454A) was not susceptible to cleavage by hydroxylamine or thiols. It is interesting that the fusion protein with the N454A/C455A double mutation also underwent substantial cleavage at the upstream splice junction in vivo or in the course of purification (Fig. 3B, lane 1); whether this was due to hydrolysis or the action of endogenous nucleophiles remains to be determined.
Step 2. Transesterification-The second step in the postulated protein splicing pathway, the rearrangement of the thioester intermediate (involving the thiol of Cys-1) through nucleophilic displacement by the thiol of Cys-455 to yield a branched intermediate, should also be independent of Asn-454 and therefore occur in an N454A mutant. However, examination of the products expressed by such a mutant (Fig. 2B, lane  5) yielded no evidence of a branched intermediate, which would be expected to migrate more slowly than MYT on SDS-PAGE (14). One explanation could be that the N454A mutation resulted in a conformational change unfavorable for the nucleophilic displacement by Cys-455. On the other hand, consideration of the chemical equilibria involved, i.e. an N-S acyl shift followed by a transesterification, suggests that there would be little accumulation of the branched intermediate under these circumstances, because the equilibrium of the N-S acyl shift strongly favors peptide formation and the equilibrium constant for the transesterification of a thioester by a thiol (i.e. Cys-455) being about 1. A possible solution to this predicament was to replace Cys-455 by Ser, so that transesterification would involve the conversion of a thioester to an oxygen ester (a reaction with an equilibrium constant at pH 7 of about 60 (25)), which would make the branched intermediate more likely to form (and more stable in vivo and during purification). Indeed, the products expressed by an N454A/C455S double mutant included substantial amounts of a polypeptide migrating more slowly than MYT (Fig. 2B, lane 7; Fig. 4), which on N-terminal analysis proved to be a branched protein with two N-terminal sequences (Table II). According to our hypothesis, the branch of the intermediate derived from the MYT double mutant N454A/ C455S, unlike that in the wild-type intermediate, should be attached by an oxygen ester rather than a thioester bond and should therefore be alkali-sensitive but unaffected by neutral hydroxylamine, which preferentially cleaves thioesters (25). The results shown in Fig. 4B fully confirmed this prediction.
The question of whether the formation of the branched intermediate is indeed an obligatory second step in the protein splicing pathway, rather than occurring without a prior N-S shift, as suggested earlier (Pathway A in Xu et al. (17)) or after the succinimide formation (Asn-454 cyclization), as proposed by Cooper et al. (12), merits some consideration. The first possibility is clearly excluded by the observation that the branched intermediate was detected in the double mutant N454A/C455S but not in a triple mutant which had, in addition, the substitution C1S and therefore was unable to undergo an N-S acyl rearrangement (Fig. 2B, lanes 7 and 8). The N-terminal cleavage by exogenous cysteine (Fig. 3), which mimics the transesterification step in the splicing pathway, provided additional evidence that an N-S shift occurs prior to branched intermediate formation, since it is unlikely that a free cysteine can cleave a peptide bond. The second possibility (Cooper et al. (12)) is incompatible with the experimental data that demonstrate a branched intermediate with two N termini corresponding to those of the N-extein and the intein, suggesting that succinimide formation could not be the initial step in protein splicing.
Recently, an alternative mechanism was proposed for the formation of the branched intermediate in protein splicing, which postulated a nucleophilic attack by the Asn amide nitrogen on the peptide bond at the upstream splice junction (27). The observation that a branched intermediate was formed in a protein in which Asn-454 was replaced by Ala rules out such a possibility.
Step 3. Peptide Cleavage Coupled to Succinimide Formation-The observation that the first two steps of protein splicing can occur in the absence of Asn-454, but that the intermediates that accumulate under these conditions cannot undergo cleavage of the bond joining the intein and the C-extein, supports the key role of this residue in the cleavage step. Analysis of the C-terminal amino acid of the excised wild-type intein showed that it consisted of approximately equimolar amounts of Asn and aminosuccinimide, a cyclization product of Asn (Table I). This observation confirms the results of similar studies on a thermostable intein (17,18) and suggests that the cleavage of the peptide bond at the C-terminal splice junction is coupled to cyclization of Asn-454 to yield a C-terminal aminosuccinimide residue.
Step 4. Rearrangements of the Transient Splicing Products-The immediate products of the cleavage step in protein splicing are the excised intein with a C-terminal aminosuccinimide residue and the spliced exteins linked by a thioester bond. Earlier studies on model peptides showed C-terminal aminosuccinimide residues to be subject to slow hydrolysis with a half-life of about 80 h at pH 7.4 and 25°C (18). Hydrolysis of the terminal aminosuccinimide residue of the excised intein would thus occur under ambient conditions but at a rate which is slow compared with the doubling time of yeast. On the other hand, the rearrangement of thioesters of N-terminal Cys residues by S-N acyl shifts is known to be exceedingly rapid, with the equilibrium favoring the amide form (28 -30). Considering that all the catalytic elements that promote protein splicing reside in the intein plus the first C-extein residue (12)(13)(14), it is significant that the final rearrangement of the ester bond linking the exteins to a normal peptide bond involves a rapid, spontaneous reaction that requires no catalytic assistance.

Side Reactions and Their Exploitation to Achieve Efficient Protein Cleavage at the Upstream or Downstream Splice Junctions
The study of the effect of amino acid substitutions on protein splicing has not only provided insights into the specific roles of the splice junction amino acids in protein splicing but has allowed us to modulate the splicing process so as to achieve efficient protein cleavage at either of the two splice junctions. Peptide cleavage is effected by the side reactions shown on Fig.  1, right side, whose efficiency can be greatly enhanced by blocking the main splicing pathway at specific steps by appropriate amino acid substitutions.
Cleavage at the Downstream Splice Junction-Cyclization of Asn-454 coupled to cleavage at the downstream splice junction can occur independently of the other reactions in the protein splicing pathway. Thus, when the first step of protein splicing was blocked by a C1S substitution, which prevents the formation of the thioester intermediate, efficient downstream cleavage occurred in vivo, and MY was purified as the major product (Fig. 2B, lane 3). Analysis of the C-terminal residue of MY thus produced showed the presence of aminosuccinimide (Table I), confirming that the abortive C-terminal cleavage was coupled to cyclization of Asn-454. Another context in which downstream cleavage occurred efficiently in vivo was the C455S substitution (Fig. 2B, lane 4). This was probably due to kinetic factors, the hydroxyl group of Ser-455 being a much weaker nucleophile than the Cys thiol so that transesterification, ordinarily more rapid than the cyclization of Asn, has become the slower of the two reactions. It illustrates that the operation of Steps 1-3 of the normal protein splicing pathway (Fig. 1) represents an intricately evolved balance of chemical reactivities and self-catalysis, whose perturbation can lead to unproductive side reactions.
Cleavage of proteins by the cyclization of Asn residues has been also been found to occur in other systems, especially when the adjacent amino acid is Gly. However, because it proceeds at very low rates, this kind of protein cleavage is seen naturally only in very long-lived proteins such as the crystallins (31). In protein splicing and its side reactions, peptide cleavage coupled to Asn cyclization is much more rapid, perhaps owing to assistance by the neighboring His residue (18,39).
Cleavage at the Upstream Splice Junction-The thioester bonds involving the side chain of Cys-1 in the ester intermediate or of Cys-455 in the branched intermediate are potential sites of protein cleavage owing to the relative instability of thioesters in comparison to amides (25). However, although thermodynamically quite unstable, thioesters are kinetically almost as stable toward hydrolysis as oxygen esters (32), with half-lives in the range of 2 h at pH 7.5 (33). Nevertheless, some in vivo cleavage at the upstream splice junction could be observed in mutants with the double amino acid substitution N454A/C455A, in which the formation of the branched intermediate and downstream cleavage is blocked (Fig. 3B, lane 1). On the other hand, rapid and quantitative cleavage at the upstream splice junction could be achieved by nucleophileassisted cleavage of the thioester bond in mutants in which downstream cleavage was blocked by the N454A substitution, using either hydroxylamine or thiols as nucleophiles (e.g. Fig.  3, A and B).
Recently, an interesting example of an analogous, self-catalyzed cleavage reaction was discovered in the hedgehog family of developmental signaling proteins (34), which may involve the formation of thioester intermediates by N-S acyl rearrangements (35). Other mechanistically similar reactions are the N-O acyl rearrangements described in a thermophilic intein (16,39) and in the autocleavage reactions of certain bacterial decarboxylases (36) and glycosylasparaginases (37).

Detection of Intermediates in an Efficient in Vitro Splicing System
As described in the preceding section, efficient protein splicing depends on a delicate balance of the reactivities of the three key amino acids that participate directly in the first three steps of the splicing pathway and whose perturbation can lead to the side reactions shown on Fig. 1, right side. The down-modulation of the splicing rates of proteins containing mesophilic inteins so as to allow the study of protein splicing in vitro thus presents a formidable challenge. A key element in our strategy for slowing the splicing process was based on the observation that, in the intein from Pyroccocus species GB-D, the His residue at the downstream splice junction is not essential for branched intermediate formation (i.e. Steps 1 and 2) but is required to assist the cleavage of the branched intermediate coupled to the cyclization of Asn (39). Whereas replacement of His-453 in the VMA intein with many amino acids completely blocked protein splicing, replacement with certain amino acids, such as Leu, afforded a low level of splicing activity (12). We reasoned, therefore, that in a double mutant such as H453L/ C455S, the first three steps in the splicing process would be coordinately retarded: Step 1 by the unfavorable equilibrium of the N-S acyl shift, Step 2 by the poor nucleophilicity of a hydroxyl group in comparison with a thiol, and Step 3 by withholding the catalytic assistance of His-453. Indeed, the unspliced precursor MYT with the double substitution H453L/ C455S could be isolated in good yield when expressed in E. coli at low temperatures and underwent efficient and nearly quantitative in vitro splicing with no side reactions in 16 h at 4 or 25°C (Fig. 5). Furthermore, the splicing reaction was accompanied by the transient formation of the branched intermediate, identified by its slower electrophoretic mobility and its immunoreactivity, and of the ester intermediate, identified by its reactivity with thiols, showing that these substances are indeed produced in the course of normal protein splicing and not just dead-end side products of the aborted splicing reactions of mutant inteins.

Perspectives
The work presented here has greatly advanced our understanding of the mechanism of protein splicing, to the extent that we can now rationally redesign inteins not only to modulate the rate of protein splicing but to subvert the splicing process into efficient polypeptide cleavage at either the upstream or downstream splice junction. Nevertheless, many mechanistic details of the protein splicing reaction still need to be worked out. The first three steps in protein splicing occur at much greater rates than analogous reactions in other proteins and undoubtedly are assisted by other residues that act as acid or base catalysts to promote the three successive nucleophilic displacements involved. Although conserved residues and motifs have been identified throughout the intein sequence (38), their relevance to protein splicing is not clear because inteins are bifunctional proteins that also function as homing endonucleases (8). The three-dimensional structure of inteins must also play a critical role in protein splicing by correctly aligning the splice junctions so as to coordinate the reactions at sites that are separated by more than 360 amino acid residues. Complete characterization of all determinants involved in intein folding and the catalysis of protein splicing will require much more extensive mutagenesis studies as well as information that can be obtained only by solving the crystal structure of an intein. Nevertheless, while these challenging investigations are in progress, inteins can serve as a rich source of modules that will undoubtedly find important uses in protein engineering owing to their ability to undergo efficiently selfcatalyzed and highly specific peptide bond cleavage and rearrangement reactions.