Protein Splicing of Inteins with Atypical Glutamine and Aspartate C-terminal Residues* □ S

Inteins are protein-splicing domains present in many proteins. They self-catalyze their excision from the host protein, ligating their former flanks by a peptide bond. The C-terminal residue of inteins is typically an asparagine (Asn). Cyclization of this residue to succinimide causes the final detachment of inteins from their hosts. We studied protein-splicing activity of two inteins with atypical C-terminal residues. One having a C-terminal glutamine (Gln), isolated from Chilo iridescent virus (CIV), and another unique intein, first reported here, with a C-terminal aspartate, isolated from Carboxydothermus hydrogenoformans (Chy). Protein-splicing activity was examined in the wild-type inteins and in several mutants with N- and C-terminal amino acid substitutions. We demonstrate that both wild-type inteins can protein splice, probably by new variations of the typical protein-splicing mechanism. Substituting the atypical C-terminal residue to the typical Asn re-tained protein-splicing only in the CIV intein. All diverse C-terminal substitutions in the Chy intein (Asp 345 to Asn, Gln, Glu, and Ala) abolished protein-splicing and generated N- and C-terminal cleavage. The observed C-terminal cleavage in the Chy intein ending with Ala cannot be explained by cyclization of this residue. We present and discuss several new models for reactions in the protein-splicing pathway.

Inteins are proteins that catalyze their own excision out of diverse host proteins while ligating the polypeptide flanks (Nextein and C-extein) of their host protein. This process, termed protein-splicing, is an intramolecular event, not requiring additional enzymes. Protein-splicing, as currently understood, requires four successive steps, directly involving three conserved intein positions: (i) Cys or Ser at the N terminus of the intein; (ii) Asn at the intein C terminus; and (iii) Cys, Ser, or Thr directly following the intein C terminus (1,2). Few inteins with an N-terminal Ala residue differ in the first step of the reaction and have been described (3).
Protein-splicing typically commences by an N 3 O or N 3 S acyl rearrangement of the peptide bond between the N-extein and the N terminus of the intein into an ester intermediate. This step involves the nucleophilic attack of the thiol or hydroxyl side chain of the intein N-terminal amino acid (Cys or Ser, respectively) on the carbonyl group of the adjacent Nextein peptide bond (4). Instantaneously, the thiol or hydroxyl, of the residue following the intein C terminus, attacks the ester bond. In inteins with N-terminal Ala residues, the N-terminal peptide bond is not rearranged by the N-terminal Ala but by the nucleophilic group of the residue immediately following the intein C terminus (3). In either case, a branched intermediate is formed: the N-extein is connected by an ester bond to the side chain of the first C-extein residue, whereas this residue is also connected by a peptide bond to the intein C terminus (5, 6) ( Fig.  1). This branched intermediate is resolved by a modification of the intein C-terminal residue. In inteins with the typical Asn C-terminal residue, it is proposed that the nucleophilic ␤-amide nitrogen attacks the carbonyl carbon of the peptide bond adjacent to it. This reaction is suggested to cause the cyclization of the Asn into an aminosuccinimide ring, (Fig. 1) (6 -8). In this step, the two exteins, ligated by an ester bond, detach from the intein. Finally, two independent events occur. The ester bond between the two exteins rapidly rearranges into the more stable amide bond, forming a peptide bond, and the aminosuccinimide C terminus of the intein undergoes gradual hydrolysis into Asn or iso-Asn residues (7).
Intein mutations that inhibit some stages of protein-splicing activity frequently generate protein-splicing side products. These include the cleavage of the C-terminal or N-terminal intein junctions and appearance of the, usually transient, branched intermediate form. Mutational analysis links the formation of these side products with specific amino acids on the intein sequence. It has been shown that substitution of amino acids located on three conserved positions, at the N and C terminus of the intein, and the amino acid directly following the intein C terminus, blocks protein-splicing but still produces some side products. Particularly, substituting the C-terminal Asn into Ala in four different inteins (3, 9 -11), into Gln in two inteins (9,10), or into Asp in one intein (9) blocks C-terminal cleavage while still allowing N-terminal junction cleavage and branched intermediate formation.
An atypical intein with a C-terminal Gln was first reported in the Chilo iridescent virus (CIV) 1 (12). Currently, only four inteins are known to have a C-terminal Gln, the CIV intein, integrated into the large subunit of ribonucleotide reductase (R1), and three alleles found within the same integration point in the archaeal PolD DNA polymerase large DP2 subunits from Pyrococcus horikoshii OT3, Pyrococcus abyssi GE5, and Halobacterium species NRC-1. It was suggested that inteins with a C-terminal Gln undergo splicing in a way similar to inteins with C-terminal Asn: by cyclization of Gln into aminoglutarimide (12). Whereas the suggested reaction is analogous to Asn cyclization into aminosuccinimide, it is chemically less favorable because of the additional CH 2 group of Gln. Moreover, Asn to Gln mutations in the C termini of Saccharomyces cerevisiae and Pyrococcus species GB-D inteins abolish proteinsplicing activity (9,10).
A unique intein from Carboxydothermus hydrogenoformans DSM 6008 (Chy), first reported here, is an allele of the CIV intein. This intein is atypical, being the only known intein with an Asp in its C terminus. Current protein-splicing mechanisms cannot explain protein-splicing with a C-terminal Asp. The discovery of inteins with atypical C-terminal residues raises questions regarding their protein-splicing activity and the particular biochemical pathway in which this process proceeds.
In this study, we show protein-splicing of the CIV intein, with a C-terminal Gln, and of the Chy intein with a C-terminal Asp. C-terminal substitution of these inteins to Asn blocks protein-splicing in the Chy but not in the CIV intein. N-terminal substitution of both inteins to Ala blocks N-terminal cleavage. Mutation analysis of the Chy intein C terminus suggests that C-terminal autocleavage can proceed even with a substitution of the C-terminal Asp by alanine (Ala). Our results support C-terminal autocleavage of the Chy intein with diverse C-terminal amino acids. The Chy intein C-terminal amino acid substitution to Ala enables C-terminal cleavage that cannot proceed by C-terminal residue cyclization. The implications of these results to protein-splicing reactions in particular and protein chemistry in general, are discussed in this report.

EXPERIMENTAL PROCEDURES
DNA Samples-The CIV R1 intein gene was amplified by PCR from CIV genomic DNA (provided by James Kalmakoff, University of Otago, New Zealand). C. hydrogenoformans R1 intein gene was amplified by PCR from genomic DNA (supplied by DSMZ, Braunschweig, Germany).
Oligonucleotides and PCR Conditions-The oligonucleotides used for cloning and substitution of the C or N termini of the inteins are detailed in Supplementary Materials Table S2. PCR mixtures contained TaqDNA polymerase (1 unit), and its supplied buffer (Sigma), 200 mM dNTP, 10 mM of each primer, and 100 ng of genomic DNA all in a 50-l reaction. Amplification was carried out using a Biometra thermal cycler.
Functional Assay of Protein-splicing-To create a functional assay for protein-splicing activity, a plasmid containing the genes malE and cbd for two protein tags, maltose-binding protein (MBP) and chitinbinding domain (CBD), was assembled as described earlier (13). The FIG. 1. Protein-splicing mechanism pathways. Two potential pathways leading to the formation of an excised intein with a C-terminal aminosuccinimide are illustrated. Cys residues are shown at both N-and C-splice junctions of the intein and the typical Asn is shown as the intein C-terminal end. On the left (steps 1-6) is the model of Xu and Perler (4). On the right (steps 4Јa to 4Јc) is the variation suggested by A. L. Nussbaum (Paulus,18) to the model of Xu and Perler. The two models only differ in the resolution of the branched intermediate (steps 4 and 4Ј).
resulting plasmid termed pC2C is based on a modification of the pMALC2 vector (New England Biolabs, Beverly, MA). All intein inserts were cloned between the BamHI and XbaI sites upstream from the cbd and downstream of the malE coding sequences. DNA sequence of all intein inserts was verified by DNA sequencing.
Protein Expression-Growth, expression of the different inteins in E. coli, and lysis of bacterial cells followed the same procedures as described previously (13). The supernatant extracted from the lysed cells was used for further analyses.
Purification of CIV Wild-type and Mutant Inteins Splicing Products-Protein-splicing products were purified from extracted supernatant over amylose (New England Biolabs) or chitin (New England Biolabs) beads in 20 mM Tris, pH 7.4, 0.2 M NaCl (Buffer A) according to manufacturer's instructions (New England Biolabs, "Expression and Purification of Proteins from Cloned Genes Instruction Manual," version 4 and "One-Step Purification of Recombinant Proteins Using a Self-cleavable Affinity Tag Instruction Manual," version 1.2). The purified protein samples were used for mass spectrometry and SDS-PAGE. Prior to gel loading protein samples were heated for 3 min at 100°C in 62 mM Tris-Cl, 2% SDS, 10% (v/v) glycerol, 0.01% bromphenol blue, pH 6.8 (Buffer B). Protein-splicing products were eluted by Buffer A containing 10 mM maltose and Buffer B from amylose and chitin beads, respectively. CIV intein protein was purified for mass spectrometry analysis directly from a Tris glycine gel after SDS-PAGE.
Purification of Chy Wild-type and Mutant Inteins Splicing Products-Protein-splicing products were purified and eluted in the same method as the CIV protein products described above. Protein products immobilized on chitin were incubated in Buffer A at either 70 or 4°C for 20 min before elution from beads. Protein products incubated at 4°C were heated for 3 min at 100°C in Buffer B prior to SDS-PAGE, and those incubated at 70°C were directly applied on gels after the addition of Buffer B.
Purification of Chy Intein-Chy intein was purified by heat treatment of extracted supernatant at 80°C for 15 min in Buffer A with 10% glycerol. The heated sample was centrifuged at 20,000 ϫ g for 5 min and the supernatant was subjected to SDS-PAGE. Finally, the band corresponding to the intein M r was purified from the gel and sent for further mass spectrometry analysis.
Protein-splicing Activity Assay-Intein protein-splicing activity was determined by the appearance of the spliced and excised products. Detection by SDS-PAGE of the intein-mediated protein-spliced product MBP-CBD (MC) as compared with a control fusion protein (MC); and detection of intein product (I), spliced of its host protein. Detection of MC product was supported by Western blots, and that of the Chy and CIV intein by mass spectrometry using matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF). Protein quantity was estimated by comparing the intensity of Coomassie Blue staining of samples run on the same gel. Protein-splicing products were compared with the total amount of protein in each lane. Stained gels were dried and scanned with a UMAX 610S scanner and analyzed with NIH Image 1.62 software.
LC-ESI-MS Peptide Analysis of CIV Mutants-Protein samples of the CIV-C1A, CIV-C1A/Q339N, and C1A-C340Stop mutants were expressed and purified on an amylose affinity column as described above. Purified proteins were concentrated by Centricon® (Millipore, Bedford, MA) mini-columns with a 50-kDa cutoff. All protein samples were equilibrated to a concentration of 500 g/ml and examined for purity using SDS-PAGE. Proteins were denatured by mixing with 8 M urea, 10 mM ammonium bicarbonate, pH 8.0. Urea concentration was lowered to 2 M by dilution with the addition of trypsin solution. Tryptic digest was conducted overnight at 25°C in 2 M urea, 10 mM ammonium bicarbonate, pH 8.0. Peptides were resolved by reverse-phase chromatography on 0.1 ϫ 300-mm fused silica capillaries (J&W, 100 mm inner diameter) self-filled with porous R2 (Perspective). The peptides were eluted using a 80-min linear gradient of 5 to 95% acetonitrile with 0.1% acetic acid in water at a flow rate of about 1 l/min. The liquid from the column was electrosprayed into an ion-trap mass spectrometer (LCQ, Finnigan, San Jose, CA). Mass spectrometry was performed in the positive ion mode using repetitively full MS scan followed by collision induced dissociation of the most dominant ion selected from the first MS scan. The mass spectrometry data were compared with simulated proteolysis and collision induced dissociation of the proteins in the NR-NCBI data base using Sequest software (J. Eng and J. Yates, University of Washington and Finnigan, San Jose).
Antibodies-Monoclonal mouse antibodies directed at MBP (Novus Biologicals, Inc., Littleton, CO) and polyclonal rabbit antibodies directed at CBD (New England Biolabs) were used for identification of fusion protein harboring the corresponding tags. Secondary horseradish peroxidase-conjugated goat antibodies directed against mouse IgG or rabbit IgG (Jackson ImmunoResearch Laboratories, Inc., West Grove, PA) were used for the Western blot analysis.
MALDI-TOF Mass Spectrometry Analysis of Proteins and Peptides-Electroelution from the gel and in-gel digestion, intact molecular weight measurements of proteins and peptide mass mapping of enzymatically cleaved peptides using MALDI-TOF, are as previously described (13).
N-terminal Amino Acid Sequencing-Amylose or chitin-purified proteins were subjected to electrophoresis on the Tris glycine buffer system with 10% polyacrylamide gels. Gels were electroblotted to a polyvinylidene difluoride membrane (Bio-Rad) and stained with Coomassie Brilliant Blue R-250 according to the manufacturer's instructions (Hoefer, protein electrophoresis application guide). Polyvinylidene difluoride membranes were then destained with in 40% methanol, 7% acetic acid followed by 90% methanol and 10% acetic acid. Protein bands corresponding to the molecular mass of MIC (88 kDa), MC (50 kDa), and IC (45 kDa) were cut with a clean scalpel from the polyvinylidene difluoride membrane and each subjected to Edman degradation. The data were acquired and analyzed on a Prosize 4.90 data system (Applied Biosystems).
C-terminal Autocleavage Rate of CIV Intein-The C-terminal cleavage rate was carried out on CIV-C1A and CIV-C1A/Q339N amylosepurified samples, in 20 mM Tris-Cl, 0.2 M NaCl. Duplicate samples were incubated at 37°C in three different pH values 6.4, 7.2, and 8.2 for various times and the reaction was terminated by flash freezing the protein samples in liquid nitrogen. Samples were then kept in Ϫ70°C until analyzed by SDS-PAGE. Coomasie-stained gels were scanned with a UMAX 610S scanner and analyzed densitometrically with NIH Image 1.62 software. The only observed bands on the gels were those of the C-terminal cleavage product band (MI) and the precursor (MIC). The MI band in each lane was normalized to the total intensities of the MI plus MIC. The rate constants for cleavage were calculated from the slopes of the pseudo-first order plots of the natural logarithm of 1-MI/ (MICϩMI) against time (min) (Supplementary Materials Fig. S3).
Sequence Data-The C. hydrogenoformans Z-2901 R1 ribonucleotide reductase gene is present in TIGR data set 246194 contig 2341, which can be identified on the NCBI microbial genomes BLAST page. 2

RESULTS
Inteins were tested for protein-splicing activity when overexpressed in Escherichia coli and fused to N-and C-terminal protein tags. The precursor fusion proteins were maltose-binding protein/intein/chitin-binding domain (MIC). Reaction products were examined and the nature of the C-terminal ends of spliced inteins was studied by mass spectrometry. N-and Cterminal amino acids of both inteins were substituted to assess their protein-splicing mechanism (Fig. 2).
Protein-splicing of the CIV Intein with a C-terminal Gln-Overexpression of the CIV intein fusion protein in E. coli resulted in protein-splicing of the MIC precursor to MC and I products (Fig. 3). Identity of the putative MC protein band (the ligated exteins) was validated by comparing its apparent M r to that of a control MC without an intein, by mass spectrometry (Table I), and by Western blot using antibodies against both chitin-binding domain (C) and maltose-binding protein (M) (Supplementary Materials Fig. S2). The I product was observed after purification on both amylose and chitin beads. This phenomenon probably results from aggregation of I with other protein products as supported by gel-filtration chromatography of the wild-type sample (not shown). Mass spectrometry of the putative I band, detected by SDS-PAGE, at the M r of the CIV intein, showed an intact mass corresponding to the expected mass of the CIV intein (Tables I-III). Peptide mass content, following proteolytic cleavage of this protein, as measured by MALDI-TOF, identified this protein as the intein (Supplementary Materials Fig. S1).
Additional protein bands corresponding to the MIC precursor and single splice junction cleavage products MI, IC, and M were detected following SDS-PAGE (Fig. 3) and MALDI-TOF (Table I). The band corresponding to M was also observed, in smaller quantities, in the control MC fusion protein, expressed without an intein (Fig. 3). Appearance of this band is believed to be at least in part because of premature termination of either translation or transcription (13).
Amino Acid Substitution of the CIV Intein N-and C-terminal Residue-To examine the splicing pathway, the activity of mutated CIV intein was tested. Substitution of the CIV intein C-terminal Gln by Asn (CIV-Q339N) still enabled protein-splicing activity (Fig. 3). Protein-splicing yield, measured by the ratio of the ligated MC to the rest of the protein products, shows a 7-fold reduction of the CIV-Q339N mutant compared the wild-type.
Substitution of the N-terminal Cys of both the wild-type and CIV-Q339N to Ala (CIV-C1A and CIV-C1A/Q339N) blocked N-terminal cleavage as evidenced by the formation of only a C-terminal junction cleavage product, MI, from the MIC precursor (Fig. 3). The yield of MI produced by the in vivo Cterminal junction cleavage of the CIV-C1A/Q339N was 4-fold lower than that of MI produced by CIV-C1A (Fig. 3, lanes 7 and  8). However, the rate of MI formation from MIC measured in vitro was higher (p ϭ 0.01 at 0.05 confidence level for paired samples t test) for the CIV-C1A/Q339N than for the CIV-C1A mutant in neutral, mild acidic, and basic conditions (Table IV and Supplementary Materials Fig. S3). Substitution of the C-terminal Gln to Ala (CIV-Q339A) blocked C-terminal cleavage of the intein but still allowed N-terminal cleavage as ap- The right SDS-PAGE gel has higher resolution for the larger proteins. Position and M r of protein-splicing products MC and I, N-and C-terminal cleavage products IC, MI, and M, and the precursor MIC are shown on the side of the gel. C cleavage product (ϳ7 kDa) was not observed in any of the samples. Samples in lanes 2, 3, 5, and 6 were purified on a chitin column. Samples in lanes 4 and 9 were purified on amylose column. parent by the large accumulation of IC (Fig. 3, lane 5). CIV-C1A and CIV-Q339A mutants cleavage was prevented at the mutated intein-extein junction points, while allowing the cleavage of the other, wild-type, junction point. Expression of CIV-C340Stop mutant, which does not contain a C-extein, produced MI, I, and M. I and M formation was enhanced by dithiothreitol or NH 2 OH (not shown).
Analysis of the CIV Intein C Terminus-Inteins with a Cterminal Asn are known to form an imide-ring during proteinsplicing. To confirm the presence of such a cyclization analogue product in inteins with a C-terminal Gln both the wild-type and mutant CIV inteins were analyzed by mass spectrometry.
The mass of the C-terminal peptide from the spliced CIV intein and from an intein generated by N-terminal junction cleavage of a fusion protein that did not have a C-extein (CIV-C340Stop) were compared with each other. Mass spectrometry data showed no difference between the C-terminal peptide of the CIV wild-type and that of CIV-C340Stop (Table II). These results were obtained for two tryptic and one chymotryptic peptides, for wild-type spliced and native C-terminal inteins. All masses were within 0.7 to 0.05 Da of the mass expected for a peptide with a C-terminal Gln.
Because the half-life of a glutarimide ring, if formed, is unknown we wanted to minimize the amount of time elapsed from the moment a C-terminal cleavage event occurred to the time it is analyzed by mass spectrometry. The gradual Cterminal autocleavage of the CIV-C1A and C1A/Q339N mutants provided such a system. The C-terminal peptide masses of the MI fusion protein formed by both of these mutants and the CIV-C340Stop mutant were also analyzed. Results agreed with the previous data obtained with the spliced intein. In both approaches the C-terminal peptide of the intein was detected. However, in all cases, including that of the CIV-C1A/Q339N, no reduction of 18 Da from the expected mass of the C-terminal peptide ending with the Gln residue (Tables I-III) was detected.
Identification of a C-terminal Asp Intein-A unique intein was identified by sequence data analysis in the large subunit of ribonucleotide reductase (R1) of the hyperthermophilic bacteria C. hydrogenoformans (Chy) (Fig. 4). It is an allele of the CIV intein, present in the same integration point in R1, but not particularly related to it, compared with the other alleles of this integration point (not shown). Sequence comparison to other known inteins shows this intein to have a unique C-terminal Asp and a penultimate Leu.
Protein-splicing of the C-terminal Asp Chy Intein-Chy intein fusion protein was overexpressed in E. coli and was shown to protein splice (Fig. 5). Detection of MC and I, confirming Chy intein protein-splicing activity was based on several methods. These included the identification of the MC band by comparison of its M r to that of a control MC product, from a fusion without an intein, Western blot with antibodies raised against either MBP or CBD tags, and by N-terminal amino acid se-quencing data of MC (Supplementary Materials Fig. S2 and Table S1). The band corresponding to the M r of the Chy intein on SDS-PAGE had the expected intact mass (Table I) and the peptide content (Supplementary Materials Fig. S1) of the Chy intein as evidenced by its mass spectrometry analysis. Other products having the apparent M r of MI, M, IC, and MIC precursor were also detected by SDS-PAGE (Fig. 5). The C-terminal peptide mass of the Chy intein was also detected by mass spectrometry (Table II) and corresponded to its C-terminal peptide having a C-terminal Asp.
Amino Acid Substitutions of the Chy Intein-Both the N-and C-terminal residues of the Chy intein were substituted to evaluate protein-splicing activity and autocleavage of the N-and C-terminal junction points. After in vivo overexpression at 37°C, proteins were purified on both chitin and amylose beads at 4°C. To examine the effect of temperature on Chy intein activity its chitin-bound products were incubated for 20 min at 70°C, before elution from the beads. This temperature is close to the natural environment of the thermophilic bacterium where this intein is present (78°C) (14) (Fig. 5).
C1A mutation blocked protein-splicing and N-terminal cleavage, as expected. A double mutant Chy-C1A/D345A inhibited protein-splicing, and also any other autocleavage processes, as evident by appearance of only the precursor (MIC). Chy intein with the C-terminal substitution to Asn (D345N) did not show any protein-splicing activity (no MC product). However, N-terminal cleavage, as evident by the appearance of IC and M, was observed for all C-terminal mutants, namely Chy-D345A, Chy-D345E, Chy-D345N, and Chy-D345Q. Additionally, a faint protein band with the apparent M r of the Chy intein (I) appears in protein samples purified by chitin beads.
After incubation at 70°C some of the protein bands are more prominent than others. The protein band corresponding to the I and MI product are more prominent in the wild-type and Chy-C1A mutant, respectively. In contrast the MIC precursor becomes less prominent for both wild-type and the Chy-C1A mutant. In addition the putative intein band seems denser for all Chy C-terminal mutants. This band did not appear in the Chy C1A and C1A/D345A mutants. A second band migrating slightly faster than the suspected I band appeared after heating to 70°C in the Chy D345-N and Chy-D345Q mutants.
The putative intein band from the Chy-D345A mutant was purified from gel and its peptide content was analyzed by mass spectrometry. This band had the peptide content of the Chy intein ( Supplementary Materials Fig. S1) although the C-terminal peptide was not resolved. It is important to note the appearance of the suspected I band in the Chy-D345A substitution. The current protein-splicing model requires cyclization of the intein C-terminal amino acid, not possible for an Ala residue.
The double mutant Chy-C1A/D345A, produced a single protein band on SDS-PAGE having the M r of MIC (Fig. 5, lanes 4  and 12). N-terminal sequencing of the band corresponding to the M r of MIC from Chy-C1A/D345A gave the expected sequence for the N-terminal end of MIC (Supplementary Materials Table S1). N-terminal sequencing of the parallel protein band from the Chy-D345N mutant band showed a mixture of two amino acids for each position. This mixture corresponded to two N termini, one of the M domain (same as found for the Chy-C1A/D345A mutant) and one of the I domain (Supplementary Materials Table S1). As with other inteins, this phenomenon of two N termini indicates the formation of a branched product (5). Thus, the migration of the branched intermediate of Chy-D345N is indistinguishable, or at least very similar, to that of the MIC band observed for Chy-C1A/D345A. In other inteins reported thus far the branched product migrated differ-

DISCUSSION
The full biochemical pathway leading to protein-splicing was studied in detail for three different inteins. These are the DNA polymerase intein from Pyrococcus species GB-D (Psp pol-1) (4, 6, 7, 9), the S. cerevisiae vacuolar membrane ATPase intein (SceVMA) (8,15), and the KlbA intein of Methanococcus jannaschii (3). This last intein has an atypical N-terminal Ala residue and was shown to protein splice by a variation of the known pathway (3). The pathway described for these inteins shows that protein-splicing includes three basic steps: two cleavages of the insert ends and ligation of its flanks. Following cleavage of the intein N-terminal junction, cleavage of the intein C-terminal junction proceeds and enables the release of two products, the intein and its ligated flanks. Intein C-terminal junction cleavage is known to proceed via a cyclization process of the C-terminal Asn to succinimide (6 -8). In this study we show that protein-splicing could also efficiently proceed in inteins with atypical native C-terminal Gln or Asp residues.
Protein-splicing without a C-terminal Asn-As more intein sequences are identified, more variant inteins with atypical amino acids in highly conserved catalytic positions are observed (3,12,16). Apart from the CIV intein, C-terminal Gln is found in all three known archeal PolD DNA polymerase large DP2 subunit inteins. However, these archaeal PolD DNA polymerases and CIV inteins are not particularly related to each other, compared with other intein sequences (12). All other reported intein alleles of the CIV intein, except the one from Chy, have Asn at their C termini (12). Hence, inteins with C-terminal Gln seem to have arose independently, and within the R1 intein alleles the CIV and Chy inteins each separately diverged to have atypical C-terminal amino acids. These last two inteins are less similar to each other than to other R1 intein alleles. There are no obvious sequence features common to both these inteins. CIV is present in a mesophilic eukaryote (the virus host insect Chilo suppressalis) and Chy is a hyperthermophilic bacterium. Whereas CIV is the only reported intein to be expressed in the cytoplasm of multicellular organisms, the other C-terminal Gln inteins are present in archea. Typical C-terminal Asn inteins are present in mesophyles, thermophyles, and hyperthermophyles and in all three domains of life. We suggest that subtle sequence features are likely to be responsible for efficient protein-splicing in the CIV and Chy inteins that have atypical C-terminal amino acids.
Protein-splicing activity of two wild-type inteins, each with a different and atypical amino acid in an active site position, indicates they protein splice by a different mechanism from other inteins. For both CIV and Chy inteins splicing can proceed by cyclization of their C-terminal residues. Our results, however, do not support the formation of a stable cyclic Cterminal amino acid. Another notable activity was C-terminal cleavage observed for Chy intein C-terminal mutants. The native Asp residue was substituted in these mutants to a variety of amino acids: Asn, Gln, Glu, and Ala. The observed cleavage in C-terminal mutants is so far unique to the Chy intein. This activity seems to require prior N-terminal cleavage and does not lead to protein-splicing. Current protein-splicing models cannot explain this activity for Ala and Glu residues.
Protein-splicing with C-terminal Gln-We offer two possible variations to explain the protein-splicing pathway of inteins with a C-terminal Gln. One explanation is Gln cyclization into a glutarimide ring (Fig. 6A) (12), analogous to the aminosuccinimide ring formation described for the Psp pol-1 and SceVMA inteins (6 -8). However, ring closure into glutarimide, in comparison with closure into succinimide, is expected to be less favorable, because of the additional Gln methylene group. Another explanation is that Gln cyclizes into an anhydride such as glutaranhydride (3-amino-6-imino-tetrahydro-pyran-2one) (Fig. 6B). Anhydride analogues are proposed to be deamidation intermediates of asparagine and aspartyl residues (17) and were suggested to provide an alternate route for Asn cyclization (18). Formation of the anhydride offers no obvious energetic advantages over formation of glutarimide, but either could be dependent on the secondary structure of Gln and its neighboring residues as was suggested for Asn and Asp (17). In either case, Gln cyclization will lead to cleavage of the peptide bond between the intein and the C-extein.
Aminoglutarimide and anhydride glutaranhydride could be hydrolyzed into Gln and iso-Gln. In both cases the ring product will have a mass lower than that of a C-terminal Gln by 18-D, and once hydrolyzed it will return to the original mass of C-terminal Gln residue.
Mass spectrometry results of the CIV intein splicing product indicate that the mass of the C-terminal peptide was not al-

Protein Splicing of Inteins with Atypical C-terminal Residues
tered compared with a peptide with an unmodified Gln residue. This suggests that either there was no change at all in the C-terminal Gln side chain or that a transient change, such as a cyclization event, occurred but was not detected because of short half-life of the intermediate product.
The question regarding glutarimide half-life in aqueous solution has been addressed for 3-(N-phenylacetylamino)-2,6-piperidinedione (19). The glutarimide moiety of this compound has a half-life of ϳ12 h in 40°C, pH 7. It was also noted that, higher pH and temperature values shorten glutarimide half-life (19). In another study (20), it was not possible at all to isolate a glutarimide intermediate (carbobenzoxy-DL-␣-aminoglutarimide) from alkali aqueous solution because of its rapid hydrolysis. We do not know of any data regarding the half-life of glutaranhydride. Because of contradicting or lack of enough data it is difficult to approximate the half-life of glutarimide and glutaranhydride in proteins. To minimize the time it takes from the moment a product, such as a cyclized Gln, forms to its mass spectrometric analysis, a fusion protein that constantly produced a C-terminal cleav-

FIG. 6. Alternative reactions for resolving the branched intermediate of inteins with a C-terminal Gln.
A, cyclization of Gln to glutarimide (12). B, cyclization of Gln to glutaranhydride.
Step numbers correspond to Fig. 1. age product in vitro was expressed. This CIV intein construct, having the substitution C1A, enabled the constant autocleavage of MIC to MI and C (Table IV). Detection of the C-terminal peptide of the CIV-C1A mutant, by mass spectrometry (Tables  I-III), enabled us to compare the observed mass of the Cterminal peptide with that expected for a modified (cyclized) or unmodified Gln. As a control, the C-terminal peptide mass of the CIV-C1A/Q339N mutant, having a C-terminal Asn, and the CIV-C1A/C340STOP mutant were also examined. This last mutant does not have a C-extein and therefore is presumed not to undergo any modifications at the C-terminal Gln after expression. We believe that a cyclized C-terminal product resulting from the constant autocleavage of that fusion protein insured that a short-lived intermediate will be much more abundant before the mass spectrometric analysis as compared with that originating from a fully spliced intein. In all cases mass spectrometry did not provide evidence for any cyclized product (Tables II-III). Not finding evidence for this product in the substituted C-terminal Gln to Asn (CIV-C1A/Q339N) intein is puzzling because succinimide formation on other inteins was previously validated using mass spectrometry (6 -8). The fact that the C-terminal peptide mass of the CIV-C1A/Q339N mutant fits only that of an unmodified Asn emphasizes that the inability to detect a modified (cyclized) C-terminal residue is not only limited to C-terminal Gln. These observations are therefore either a result of the apparatus and method used for analysis, or originate from the splicing process itself that might involve a different pathway. Hydrolysis studies with the succinimide originating from the Psp GB-D intein displayed a half-life of 17 h at pH 7.4 at 37°C, conditions that are similar to our analysis and purification conditions. We thus conclude that a cyclized product, if formed, is much more likely to be a readily hydrolyzed anhydride (21,22) rather than a glutarimide.
Protein-splicing with C-terminal Asp-The Chy intein naturally includes a C-terminal Asp. The mass spectrometry data corresponding to the C-terminal peptide of Chy intein and the mass of the whole intein show that the final state of the intein is with a C-terminal Asp. We propose that the Chy intein undergoes C-terminal cleavage, by one of two routes. One possibility is for the C-terminal Asp to cyclize into a succinic anhydride (Fig. 7A). Detection of the anhydride species of this intein by our present mass spectrometry apparatus is highly unlikely because of this species very short half-life in aqueous medium. Some evidence for the formation of succinic anhydride in a protein context was suggested to occur in insulin A-chain C-terminal Asn (23). The observation of protein cleavage after aspartic residues, when exposed to dilute HCl or formic acid (pH 2) and high temperature (108°C) (24), led to the suggestion that the intermediate, formed after Asp exposure to these conditions, is a succinic anhydride. Once formed, it allows the cleavage between Asp and adjacent amino acid. It has been further suggested that the Asp C-terminal side cleavage is catalyzed by the side chain of Asp (25). As far as we know, the catalysis of this type of Asp COOH-peptide bond cleavage in milder conditions is only known in particular cases in proteins between Asp and Pro. Lability of the peptide bond between Asp-Pro has been reported in less drastic acidic conditions (pH 4) or lower temperatures (40°C) (25,26).
The protein cleavage we observed cannot be attributed to sample preparations for SDS-PAGE. In the standard procedure, protein samples are boiled in SDS for a few minutes. Chitin-purified products of Chy inteins with C-terminal Asp prepared in this way included only trace amounts of C-terminal cleavage products (Fig. 5A, lanes 2 and 3, top gel). The same samples, incubated at 70°C without SDS and loaded directly on the gel, generated large amounts of C-terminal cleavage products (Fig. 5A, lanes 2 and 3, bottom gel). C-terminal cleavage products were also readily observed in a control experiment of amylose-purified Chy wild-type and mutant inteins products, which were subjected to SDS-PAGE without pre-heating (not shown).
C-terminal cleavage during the protein-splicing can also occur as suggested by Nussbaum (personal communications in Ref. 18). In this model the free N-terminal nucleophilic residue of the intein attacks the peptide bond linking the C-terminal amino acid of the intein to the C-extein in a manner analogous to catalysis by N-terminal nucleophile amidohydrolases (Fig.  1). Our results, regarding the C-terminal autocleavage, with amino acid substitutions of the Chy C-terminal Asp provide evidence to support this hypothesis. In all four Chy intein C-terminal substitutions tested (Asp to either, Asn, Gln, Glu, or Ala) a protein band corresponding to the M r of the Chy intein appeared along with the IC and M products (Fig. 5).
C-terminal cleavage of the Chy-D345A mutant intein cannot be explained by the current protein-splicing model, which requires cyclization of the C-terminal amino acid. The peptide content of the putative I band showed the peptide composition of the Chy intein. However, the C-terminal peptide of the intein was not detected (Supplementary Materials Fig. S1). This is probably a technical problem because frequently peptide mass mapping only covers part of the protein and the examined band was distinct unlike a degradation product. We believe that the intein produced in the C-terminal mutants of the Chy intein is not a result of protein-splicing because no ligated MC product was detected. We suggest that the intein in these mutants is produced only following N-terminal cleavage. This is supported by not observing the N-terminal cleavage product (MI) in any of the Chy intein C-terminal mutants, whereas the C-terminal products, M and IC, were abundant (Fig. 5). All this suggests that the intein product is not formed by protein-splicing, but by C-terminal cleavage of I from the IC product. Inability of Chy C-terminal mutants to undergo C-terminal cleavage without prior N-terminal cleavage is also supported by the products of the Chy-C1A/D345A double mutant. Expression of this mutant only produced an inactive MIC precursor. If C-terminal cleavage is independent of N-terminal cleavage the precursor of this mutant should have autocleaved into the MI product.
Occurrence of C-terminal autocleavage in the Chy-D345A mutant suggests that C-terminal cleavage of the Chy wild-type intein, as well as other C-terminal mutants, could also occur without prior cyclization of their C-terminal residue. The Cterminal end of the Chy intein might be prone to cleavage because of specific chemical and structural constraints. We suggest alternative models for C-terminal autocleavage of this intein. C-terminal cleavage in these models does not depend on the identity of the C-terminal residue. One possibility relies on nucleophilic attack by the intein N-terminal Cys on the peptide Step numbers correspond to Fig. 1. B, C-terminal cleavage of Chy intein IC product by direct nucleophilic attack of the N-terminal Cys on the carbonyl carbon of the C-terminal peptide bond. C, C-terminal cleavage of Chy intein IC product by nucleophilic attacks of both the Cys C-terminal to the intein and the inteins N-terminal Cys. R1 designate different amino acid residues. bond on the intein C-terminal end (Fig. 7B). Another possibility is illustrated in Fig. 7C. It suggests involvement of the Cys downstream to the intein in formation of an oxythiazolidine anion tetrahedral intermediate. Once oxythiazolidine anion is formed, a nucleophilic attack of the N-terminal Cys thiol group can promote the collapse of the intermediate and the formation of a thioester bond between the N-terminal Cys and the Cterminal residue.
The C-terminal cleavage observed in the Chy mutants may be only a side-reaction because of alteration of the original C-terminal residues. However, it contributes to understanding of protein-splicing as a whole by suggesting the regulation of the pathways in sequential order of reactions. Formation of the MI product from the MIC precursor in the Chy C1A mutant shows that C-terminal cleavage still occurs without a nucleophilic N-terminal amino acid side chain. Thus, C-terminal cleavage is not totally dependent on N-terminal cleavage. However, even such partial dependence of C-terminal cleavage (such as reduced cleavage rate) explains regulation of the order of events in protein-splicing. This partial dependence could be based on attenuation of C-terminal cleavage until N-terminal cleavage occurs, as was suggested earlier (18).
Activity of C-terminal Mutants-Substitution of the C-terminal amino acid to Asn in both inteins (CIV-Q339N and Chy-D345N) led to different results. CIV intein can protein-splice with either the native C-terminal Gln or Asn mutation. However, the CIV-Q339N mutant had a 7-fold reduction in yield when compared with the wild-type (Fig. 3). The lower proteinsplicing yield might be because of differences in the yield of C-terminal cleavage. To verify this hypothesis the N-terminal residue of the intein was substituted to Ala (CIV-C1A), blocking N-terminal cleavage while still enabling C-terminal cleavage. C-terminal cleavage was compared between two types of N-terminal C1A mutants, with the wild-type C-terminal Gln (CIV-C1A), and with a C-terminal mutation to Asn (CIV-C1A/ Q339N). These mutants showed different yields of C-terminal cleavage product immediately after extraction from E. coli (Fig.  3, lanes 7 and 8). C-terminal cleavage was attenuated for the CIV-C1A/Q339N mutant compared with the CIV-C1A mutant. The yield of C-terminal cleavage after overexpression in E. coli is thus higher for the intein with a C-terminal Gln. However, when the rate of C-terminal cleavage was examined in vitro, the CIV-C1A mutant showed a lower rate of cleavage than that of the CIV-C1A/Q339N mutant (Table IV). We conclude that the rate, and subsequent yield of C-terminal cleavage, are influenced by the different in vitro and in vivo environments.
The Chy-D345N mutant formed a stable branched product but no MC splicing product ( Fig. 5 and Supplementary Materials Table S1). Hence, there was no cleavage of the intein C-terminal end in the branched product of this mutant. However, we suggest that such a cleavage occurs in the IC product of this and other Chy C-terminal mutants (Fig. 7B). This difference in cleavage is probably because of differences between the intein C-terminal end in the branched product and in the IC product. In the branched product the side chain of the Cys following the intein is not free but linked by a thioester bond to the N-extein. This probably creates a different local environment around the C-terminal end of the intein that may inhibit cleavage of the branched product intein C-terminal end. The N-extein might also be a steric hindrance for this cleavage in this mutant. We note that it is yet unknown what stabilizes the usually transient branched product of some mutants in Chy and other inteins.
General Implications-Cleavage of peptide bonds following Gln and Asp residues in inteins suggests that other proteins may autocleave themselves at sites located after these amino acids using similar reactions. Protein-splicing activity of inteins with atypical C-terminal residues also illustrates the robustness of this process. Different inteins catalyze complex chemical reactions by various amino acids to accomplish protein-splicing.
Type B bacterial intein-like domains occur in different bacteria (13). Analogous to inteins, a conserved position in the C-terminal end of all type B bacterial intein-like domains includes Cys, Ser, or Thr, corresponding to the first amino acid in C-exteins. However, unlike in inteins, no conserved residue precedes this position in type B bacterial intein-like domains. The preceding position is not well conserved but mainly includes Glu, Gly, and Leu (13). Our current findings and suggested model, regarding the Chy intein, may provide a basis for investigating the biochemical pathway of type B bacterial intein-like domains.
Gln and Asn deamidation is a well established general phenomenon in proteins, related to protein aging and sometimes causing protein cleavage (27)(28)(29). This is a major issue in the stabilization of therapeutic proteins that need to be stored for relatively long periods (30). However, contrary to Asn deamidation into succinimide, which can cause C-terminal cleavage (27,31,32), Gln deamidation into an anhydride or the analogous glutarimide was never reported in proteins and peptides. Given the example of the autocleavage processes in the inteins reported here, it is possible that Gln and Asp could support autocleavage in a similar manner also in aging and pharmaceutical proteins.