An Unprecedented Combination of Serine and Cysteine Nucleophiles in a Split Intein with an Atypical Split Site*

Background: Intein-mediated protein splicing involves rearrangement of the polypeptide backbone through ester intermediates and excises the intein from the precursor. Results: An unusual split intein was identified that converts an oxoester into a thioester. Conclusion: Inteins can catalyze two consecutive formally uphill reactions. Significance: The first example of this special case in protein splicing underlines the mechanistic diversity of inteins. Protein splicing mediated by inteins is a self-processive reaction leading to the excision of the internal intein domain from a precursor protein and the concomitant ligation of the flanking sequences, the extein-N and extein-C parts, thereby reconstituting the host protein. Most inteins employ a splicing pathway in which the upstream scissile peptide bond is consecutively rearranged into two thioester or oxoester intermediates before intein excision and rearrangement into the new peptide bond occurs. The catalytically critical amino acids involved at the two splice junctions are cysteine, serine, or threonine. Notably, the only potential combination not observed so far in any of the known or engineered inteins corresponds to the transesterification from an oxoester to a thioester, which suggested that this formal uphill reaction with regard to the thermodynamic stability might be incompatible with intein-mediated catalysis. We show that corresponding mutations also led to inactive gp41-1 and AceL-TerL inteins. We report the novel GOS-TerL split intein identified from metagenomic databases as the first intein harboring the combination of Ser1 and Cys+1 residues. Mutational analysis showed that its efficient splicing reaction indeed follows the shift from oxoester to thioester and thus represents a rare diversion from the canonical pathway. Furthermore, the GOS-TerL intein has an atypical split site close to the N terminus. The IntN fragment could be shortened from 37 to 28 amino acids and exchanged with the 25-amino acid IntN fragment from the AceL-TerL intein, indicating a high degree of promiscuity of the IntC fragment of the GOS-TerL intein.

Inteins are internal protein elements that excise themselves out of a precursor protein in a self-processive reaction called protein splicing ( Fig. 1a; for reviews see Ref. [1][2][3][4]. The flanking N-and C-terminal extein sequences are concomitantly linked with a peptide bond (5,6). Protein splicing is a remarkable multistep pathway involving a series of bond rearrangements and nucleophilic displacements (Fig. 1b) (7)(8)(9)(10). Although the overall protein structure and mechanistic logic is very similar for all inteins, different classes can be defined according to major differences in the protein splicing pathway. The class 1 inteins constitute the largest group of inteins and employ the canonical protein splicing pathway ( Fig. 1) (9,11). Here, the scissile peptide bond upstream of the intein between the Ϫ1 extein residue and the first amino acid of the intein (residue 1), which can be a cysteine or a serine, is first rearranged into the thioester or oxoester by an N-S or N-O acyl shift, respectively. This linear ester intermediate with higher energy is then attacked by the nucleophilic side chain of the first residue downstream of the intein, the ϩ1 residue, which can be a cysteine, serine, or threonine. Following this transesterification to the branched intermediate, asparagine cyclization at the ultimate position of the intein leads to formation of a C-terminal succinimide ring on the intein and to the cleavage of the downstream scissile peptide bond. The liberated N-and C-terminal extein segments are linked at this point via the side chain ester of the ϩ1 residue, however, rearrange quickly into the stable peptide bond by intramolecular attack of the ␣-amino group. Class 2 inteins divert from this splicing pathway in that they omit the first step and catalyze instead a direct attack of the ϩ1 nucleophilic side chain on the upstream peptide bond (11,12). Class 3 inteins also do not show peptide bond rearrangement at the upstream splice junction, but have an additional internal cysteine side chain that forms a thioester with the N-extein fragment through direct attack on the upstream peptide bond, before transesterification to the ϩ1 residue and the further canonical order of events occur (11).
Inteins can be regarded as single turnover catalysts. To catalyze protein splicing, several residues have important or essential functional roles. Most of these residues can be recognized as highly or reasonably well conserved in short signature sequences, designated block A, B, F, and G or motifs N1, N2, N3, C2, C1 (alternative designation) (13)(14)(15). We will refer to these as block A/N1, block B/N3 etc. In addition to the different classes of inteins, also more subtle differences regarding the contribution of individual residues to catalysis have been noted. A conserved residue essential in one particular intein may be only important for catalysis in another intein.
It is not understood why some class 1 inteins use cysteine and others serine at the catalytic position 1. Likewise, the reason for using cysteine, serine, or threonine at the catalytic position ϩ1 remains enigmatic. Interestingly, however, clear preferences for the different combinations of these key residues can be observed. Liu and co-workers (17) have analyzed the intein database (InBase) (16) and found a bias in the possible combinations of the 1 and ϩ1 positions. The Cys1/Cysϩ1 combination was the most common one (36%), followed by Cys1/Serϩ1 and Cys1/Thrϩ1 (29 and 12%, respectively). Much rarer were combinations Ser1/Serϩ1 and Ser1/Thrϩ1 employing only oxyester chemistry (both 5%). Strikingly, the combination Ser1 and Cysϩ1 has not been reported so far in all naturally occurring inteins and it is unclear why this is the case.
The findings from naturally occurring inteins were also challenged in several studies using site-directed mutagenesis to artificially change the combinations of the 1 and ϩ1 residues. Importantly, in no case was an efficiently splicing intein with a Ser1/Cysϩ1 combination reported. For example, a C1S mutation of the highly active Npu DnaE intein (native Cys1/Cysϩ1 combination) inhibited protein splicing (18). Also the Sce VMA intein (Cys/Cysϩ1 combination) showed no splicing activity with a C1S mutation (19). In one report the same intein with a combination of C1S and H362N mutation in the block B/N3 motif was described to yield an active intein, however, no data to support this notion was shown (20,21). Because only short exteins of a few amino acids were used in this study, a gel-based readout could not distinguish between splicing and cleavage reactions. Furthermore, a systematic study with six inteins showed that none of these accepted the Ser1/Cysϩ1 combination (17). Together, these observations raise the question if a Ser/Cysϩ1 combination is potentially incompatible with an efficiently splicing intein. The simple chemical explanation would be that a thioester is more energy-rich than an oxoester (22).
In split inteins the intein domain is separated on two polypeptides. For protein trans-splicing to occur, the N-intein (Int N ) 2 and C-intein (Int C ) fragments have to associate and reconstitute into the active intein (Fig. 1). Various split sites within the conserved intein-fold have been created artificially and led to active and biotechnologically useful inteins (23). Only a small fraction of naturally occurring intein is split and until very recently all known inteins exhibited the split point at the same position ϳ35-40 residues upstream of the C-terminal end of the intein sequence (total size about 130 -150 aa) (24). This split point corresponds to the typical insertion site of the homing endonuclease domain found in maxi-inteins. Very recently, the first atypically split intein was discovered in metagenomic databases and biochemically characterized. The AceL-TerL intein consists of a short Int N fragment of 25 aa and a C-terminal fragment of 104 aa (25). Metagenomic databases proved already earlier to be a rich source for new split inteins with interesting properties (26,27). For example, the gp41-1 intein is the fastest splicing split intein described so far (27).
In this work, we report on the identification and biochemical characterization of a novel naturally split intein, the GOS-TerL intein, identified from metagenomic sources. This intein employs the so far unknown combination of Ser1 and Cysϩ1 as catalytic residues. Furthermore, it is the second naturally split intein with an atypical split site close to the N terminus. Using site-directed mutagenesis, we establish the role of key residues in the protein splicing pathway. We show that the GOS-TerL intein belongs to the class 1 inteins, however, its diverting splicing pathway involves the unusual shift from an oxoester to a thioester intermediate.

Experimental Procedures
General-Unless otherwise specified, standard protocols were used. Oligonucleotides were purchased from Biolegio (Nijmegen, Netherland). All plasmids were confirmed by DNA sequencing (GATC, Konstanz, Germany, and Seqlab, Göttingen, Germany). The plasmids with synthetic DNA were ordered from MrGene (Konstanz, Germany). All reactions were performed at least in triplicates. Reagents were purchased from 5-Prime (Hamburg, Germany), Acros Organics (Nidderau, Germany), Agilent (Böblingen, Germany), AppliChem (Darmstadt, Germany), Fluka (Taufkirchen, Germany), GE FIGURE 1. Protein splicing reaction. Shown are the reactions here for protein trans-splicing. In cis-splicing the intein fragments are fused in one continuous intein. a, general scheme of protein trans-splicing with the possible cysteine, serine, and threonine residues at the 1 and ϩ1 positions of the intein or the C-terminal extein (Ex C ), respectively. b, unusual protein trans-splicing pathway proposed for the GOS-TerL intein described herein. Remarkable and unprecedented is the second rearrangement from an oxoester into a thioester.
GOS-TerL Intein Construction of Expression Plasmids-A piece of synthetic DNA was used to encode the GOS-TerL intein with five native extein residues at each side and containing an optimized codon usage for high level expression in Escherichia coli (see native gene sequence in NCBI GenBank TM entry AACY020533749.1). The KVEFE-Int N -His 6 encoding fragment was excised from this vector via BamHI and HindIII and ligated into pMAL-c2X backbone to generate MBP-Int N -His 6 . In a similar manner, the Int C -CEFLG encoding fragment was inserted into pVS01 (28) using the NdeI and KpnI restriction sites to give Int C -Trx-His 6 . Site-directed mutagenesis was performed according to the QuikChange protocol (Stratagene).
Expression and Purification of Proteins-E. coli BL21(DE3) cells were transformed with each of the expression plasmids ( Table 1 for sequence information of the encoded proteins). The individual expression strains were grown at 37°C in LB media until A 600 ϭ 0.5-0.7 was reached. The temperature was then reduced to 28°C and 0.4 mM isopropyl 1-thio-␤-D-galactopyranoside or 0.2% arabinose ( Table 1) was added to induce expression. After 4 h cells were spun down and resuspended in nickel-nitrilotriacetic acid (50 mM Tris/HCl, 300 mM NaCl, pH 8) buffer with 8 M urea. Denaturating purification conditions led to higher yields for Int C proteins and higher splicing efficiency of Int N full-length proteins. Cells were disrupted using a Potter-Elvehjem homogenizer. Proteins were purified as previously described using nickel-nitrilotriacetic acid chromatography (28).
Protein Splicing Assay-Protein trans-splicing with purified proteins or peptides was performed by mixing complementary intein fragments in splicing buffer (50 mM Tris, 300 mM NaCl, 1 mM EDTA, and 2 mM DTT, pH 7.0). Reaction partners were at a concentration of 7 to 45 M with an excess of Int N if not stated otherwise. Proteins were preincubated at specified temperatures before the reaction was started by mixing the components. At various time points aliquots were removed and the reaction quenched by adding 4ϫ SDS-PAGE loading buffer (containing 8% SDS and 20% ␤-mercaptoethanol) except for oxoester cleavage assays (see below). Samples were boiled (96°C, 10 min) and analyzed by SDS-PAGE gels stained with Coomassie Brilliant Blue. Relative intensities of protein bands were densitometrically determined using the program ImageJ.

Results and Discussion
Identification of the GOS-TerL Intein and Sequence Comparison with the AceL-TerL Intein-We identified a new intein with the unique combination of a Ser1 residue at the upstream splice junction and a Cysϩ1 residue at the downstream splice junction. By using the amino acid sequence of the AceL-TerL intein (25) as search query this intein was found in the metagenomic database at NCBI. The two fragments were identified manually and the putative intein boundaries were defined by the low homology to other inteins. The sequence data were collected by a global ocean sampling project at the Punta Cormorant hypersaline lagoon in Galapagos Floreana Island (31). As predicted from the DNA sequence, the new intein consists of 152 aa and is naturally split into an Int N fragment of 37 aa and an Int C fragment of 115 aa. The NCBI GenBank TM entry numbers for these two proteins are ECW09139.1 and ECW09138.1 and do not yet contain an annotation as putative intein. The two genes encoding the fragmented host and split-intein parts are in a consecutive arrangement separated by only 4 nucleotides. Only one DNA sequence read is deposited in the database that covers this split region. The host protein has a high similarity to the phage terminase DNA packaging enzyme large subunit (gp17) (32). According to its origin from Punta Cormorant in the global ocean sampling project and its integration point in a terminase large subunit, we referred to the intein as GOS-TerL. The closest homolog of the GOS-TerL intein is the very recently reported naturally split AceL-TerL intein (129 aa) (25) (Fig. 2). The Int N and Int C sequences show 30 and 33% identity, respectively. The most significant differences appear to be localized at the residue at position 1 and the A/N1 motif. Both inteins are integrated at the same point in very homologous phage terminase large subunits (68% identity). The inteins have identical flanking residues at their immediate NЈ (EFE) and CЈ (CEFLG) ends. Despite these similarities the origins of the metagenomic samples are very different. The AceL-TerL intein was discovered from environmental samples in Ace Lake in Antarctica, where the average water temperature was 1-2°C, whereas the DNA encoding the GOS-TerL intein was sampled from a hypersaline lagoon at the Galapagos Islands in Ecuador, where the temperature was 37°C (31). Both habitats are hypersaline salt water bodies.
To rationalize the split site of the GOS-TerL intein within the conserved HINT-fold of inteins, we predicted the structure with an intensive modeling at the phyre2 server (33). The highest structural similarity was assigned to the Pab PolII intein (Protein Data Bank number 2LCJ) (34) and the Pho RadA intein (Protein Data Bank number 4E2U) (35). As shown in Fig. 3, a split site after 37 aa would correspond to the loop structure on the surface of the intein-fold. This position corresponds to position 24/25 following ␣-helix 1 reported for the AceL-TerL inteins (25) (see below). The GOS-TerL intein thus represents the second example for this natural, atypical split site.
In Vivo Splicing in Cis and in Trans of the GOS-TerL Intein-To functionally characterize the GOS-TerL intein we ordered synthetic DNA deduced from the amino acid sequence with codon optimization for E. coli and cloned it into expression vectors. A gene fragment encoding the artificially fused intein fragments linked by a Gly-Ser dipeptide linker, and with five native N and C terminally flanking residues, KVEFE and CEFLG, respectively, was inserted between maltose-binding protein (MBP) and thioredoxin (Trx) as model exteins (Fig. 4a) (36). Following expression in E. coli, we could identify in total cell extracts large amounts of ligated exteins MBP-Trx and the liberated intein (Fig. 4c, left panels), as well as minor amounts of unspliced precursor. Western blotting using an anti-Trx antibody confirmed this interpretation. Furthermore, the antibody weakly stained a protein that from its molecular weight would correspond to the protein Int C -Trx involving a split intein fragment. We wondered whether this finding resulted from the split nature of the intein and reflected an internal start of translation, although for the cis-splicing construct a non-native codon usage optimized for E. coli was used (data not shown). To test expression of the Int C -Trx protein in the context of the natural bicistronic arrangement of the proposed consecutive genes we inserted 14 bp of the native DNA sequence upstream of the codon for the starting methionine (Fig. 4b). This sequence also included the stop codon of the Ex N -Int N encoding gene. Using this construct, the Int C -Trx protein could clearly be detected by Western blot analysis, although much weaker expressed than the MBP-Int N gene, along with the splice product MBP-Trx as a result of a trans-splicing reaction (Fig. 4c, right panels). Together, these findings show that the intein is active in splicing in cis and in trans and that the translation of the Int C -Ex C protein is started at an internal site.
Protein trans-Splicing using Purified Precursors-We prepared fusion constructs of the individual split intein fragments to test for controlled protein trans-splicing. MBP-Int N -His 6 and Int C -Trx-His 6 were expressed in E. coli and purified under denaturing conditions using the C-terminal His 6 tag. Purification of the proteins under native conditions was found to result in lower splicing activity. Following removal of the denaturant urea by dialysis, splice reactions were started by mixing the individual proteins with a 3-fold excess of MBP-Int N -His 6 at 8, 30, and 37°C. At all temperatures tested, the trans-splicing reactions went to about 80 -85% completion after 2 h (Fig. 5). The rate at 30°C turned out to be the highest with reaching the plateau of the reaction after about 15 min. No C-terminal cleavage was detectable under these conditions. The reaction kinetics were determined from a densitometric analysis (Table 2). An analysis of the splice product MBP-Trx-His 6 by tandem mass spectrometry revealed the predicted peptide containing the ligated extein sequences (Fig. 5d). This result confirmed our definition of the intein-extein boundaries of the novel and unusual intein. Together, these results show that the GOS-TerL intein precursors are highly active in protein trans-splicing when purified separately, and exhibit excellent reaction rates.

Mutational Analysis of the Block A/N1 and B/N3 Motifs-
The GOS-TerL intein exhibits a unique and unprecedented combination of Ser1 and Cysϩ1 residues at the N-and C-terminal splice junctions. To investigate their participation and other mechanistic details in the protein splicing reaction we performed an extensive mutational analysis. Our hypothesis was that the intein first forms an oxoester involving the Ser1 side chain by N to O shift at the scissile peptide bond, followed by the unusual attack of the Cysϩ1 thiol moiety onto the oxoester to form a branched thioester intermediate, and the final intein-catalyzed reaction of asparagine cyclization at the C-terminal splice junction (Fig. 1b). We introduced several mutations into the separately produced, recombinant Int N and Int C containing constructs and then monitored protein trans-splicing and cleavage reactions. Table 3 gives an overview over all tested mutations.
We first investigated the upstream splice junction (Fig. 6a). The S1A mutation completely blocked splicing activity and allowed formation of only minor amounts of C-terminal cleavage resulting from uncoupled asparagine cyclization. Substitution of this residue with cysteine (S1C mutant) supported full splicing activity, although at a slightly slower rate than with the wild-type sequence (Fig. 6b). To more directly test if a linear (oxo)ester was formed as an intermediate we blocked all downstream reactions by introducing the mutations N152A and Cϩ1A at the C-terminal splice junction. The expected (oxo)ester formed at the upstream scissile bond was then probed for nucleophilic cleavage using DTT and hydroxylamine, both of which are known to cleave thioesters efficiently and oxoesters only poorly (9). Reducing conditions were found to be required to resolve efficient formation of disulfide-linked homodimers in case of the MBP-Int N (S1C)-His 6 construct (Cys1 is the only cysteine residue in this protein). As shown in Fig. 6c, a fraction of the wild-type MBP-Int N -His 6 precursor protein (with Ser1) was cleaved to give free MBP, both in the absence and presence of nucleophiles including tris(2-carboxyethyl)phosphine as a non-nucleophilic control for reducing conditions. In contrast, the S1A mutant rendered the protein completely resistant against cleavage at the scissile bond (data not shown). Importantly, cleavage was strictly dependent on the presence of complementary construct Int C (N152A,Cϩ1A)-Trx-His 6 , underlining that intein reconstitution is required (data not shown). In the case of the S1C construct, cleavage with both nucleophiles DTT and hydroxylamine was clearly more efficient (Fig. 6c), whereas cleavage with tris(2-carboxyethyl)phosphine was comparable with the Ser1 construct. We also prepared the construct Int C (H90A,N152A,Cϩ1A)-Trx-His 6 , in which the highly conserved histidine of block B was mutated. This histidine was shown in other inteins to be important or essential for (thio)ester formation at the N-terminal scissile bond (30,(37)(38)(39). No cleavage could be observed in combination with the wild-type Int N construct (data not shown). Together, these results are in agreement with the typical rearrangement of the peptide bond around Ser1 or Cys1, respectively. Both the oxoester and thioester in the S1C construct can be hydrolyzed by water. Furthermore, the thioester can be cleaved efficiently by the nucleophiles DTT and hydroxylamine, whereas the oxoester formed by the wild-type intein exhibits a higher stability under these conditions, as proposed earlier for other inteins (9). Activation of the scissile peptide bond depends on the conserved block B/N3 His-90.
Because the most remarkable feature of this intein centers around formation of a linear oxoester that is transesterified into a thioester we had a closer look at other residues of the block B/N3 motif, which is involved in activation of the upstream   splice junction (14,30,34,(37)(38)(39)(40)(41). Instead of the consensus XTXXH motif, the GOS-TerL intein features the sequence CSXXH. Investigation of the aforementioned H90A mutation as a single substitution showed complete abrogation of protein splicing, whereas significant amounts of C-terminal cleavage products were formed (Fig. 7a). More surprisingly, S87A and S87T mutations also virtually completely inhibited splicing. In addition to C-terminal cleavage also, minor amounts of N-terminal cleavage could be observed. Thus, Ser-87 appeared essential for splicing with Ser1, but only important for the N-O acyl shift. Its mutation also impaired the coordination with the transesterification step. Interestingly, in the context of the S1C mutation protein splicing was possible with good efficiencies in the presence of either S87A or S87T substitution (Fig. 7b), showing that contribution of the block B/N3 serine is not essential for the equivalent pathway with Cys-1. Both steps, N-S acyl shift with S1C and subsequent transesterification between two thioesters, are expected to lend themselves more easily to catalysis from the chemical viewpoint. The S1C mutation thus compensates for a mutation of Ser-87. In contrast, His-90 was also essential for splicing in combination with S1C, underlining its crucial role for activation of the upstream splice junction. Fig.  7b shows that a S1C/H90A double mutant was completely inactive. Cys-86 is part of the block B/N3 motif and located next to the essential Ser-87. The thiol side chain of this residue may also be involved in the coordination of the scissile peptide bond at the upstream splice junction or the stabilization of the oxyanion in the tetrahedral intermediate, for example. Both mutations C86A and C86S were found to negatively affect protein stability and solubility. However, sufficient quantities of protein material could be recovered following Centricon concentration. Fig.  7c shows that both mutants showed activity in protein splicing, thus an essential role of Cys-86 could be ruled out.
The GOS-TerL Intein Does Not Employ Alternative Thioester Intermediates-Because the sequence of acyl shifts from oxoester to thioester is unique in the GOS-TerL intein we also tested the possibility that other side chains with nucleophilic groups within the intein take part in a stepwise acyl transfer mechanism. A diverting pathway was described for the class 3 inteins  where an internal cysteine side chain binds the N-extein as a thioester intermediate (11). Having already ruled out an essential role of Cys-86 for such a potential intermediate (see above), we also mutated the remaining cysteine at position 117. Fig. 7c shows that the intein with the C117A mutation exhibited full splicing activity. Also the double mutation C86S/C117A showed some remaining splicing activity, even though the stability of the protein against partial degradation was further reduced. Collectively, these results clearly show that the splicing pathway of the GOS-TerL intein does not use additional and atypical thioester intermediates.

Mutation of Key Residues in the Block F/C2 and G/C1 Motifs of the GOS-TerL Intein are in Agreement with their Canonical
Role-The D132A mutation of the conserved Asp in block F/C2 reduced splicing activity and uncoupled N-and C-terminal cleavage reactions (Fig. 7d) as previously reported for other inteins (42,43). An individual mutation of the highly conserved Asn in the block G/C1 motif blocked splicing and all C-terminal activity, and also induced quantitative N-terminal cleavage (Fig. 7d). Together, these results are consistent with the established roles of the investigated residues in the canonical splicing pathway. Fig. 6d shows that a Cϩ1A mutation at the C-terminal splice junction blocked splicing, as expected, but allowed for some N-terminal cleavage by hydrolysis. Interestingly, introduction of the Cϩ1S mutation also impaired splicing. Even the combination of the Cϩ1S and S1C mutations was inactive. Both of these combinations showed clearly detectable N-terminal and C-terminal cleavage activity, suggesting that mainly the transesterfication step and not the N-O acyl shift was impaired.

Mutational Analysis of the Block F/C2 Motif Reveals Strict Dependence of the GOS-TerL Intein on Cysϩ1-
The Ultrafast gp41-1 Intein and the Homologous Ace-TerL Inteins Are Inactive with a Ser1/Cysϩ1 Combination-The Ace-TerL intein is the closest characterized homolog of the GOS-TerL intein, however, it features a Cys/Cysϩ1 combination at the two critical residues at the splice junctions. It also contains the uncommon serine residue in the block B/N3 motif, similar to Ser-87 in the GOS-TerL intein. Thus, we wondered whether the Ace-TerL intein may also support protein splicing with an artificially introduced Ser1. For this naturally split intein a semi-synthetic splice assay was previously established, in which the short Int N fragment (25 aa) was provided on a synthetic and fluorescently labeled peptide pep-C1 (25). We   NOVEMBER 27, 2015 • VOLUME 290 • NUMBER 48 synthesized a new peptide (pep-C1S) containing the C1S substitution (Table 4). Fig. 8 shows that the AceL-TerL intein was fully active in the trans-splicing assay using pep-C1, as reported previously (25), however, no splice product formation or cleavage reaction could be observed with pep-C1S. Thus, the AceL-TerL intein does not support protein splicing with the Ser1/ Cysϩ1 combination. On a similar account, we chose the gp41-1 intein (26) as another test case for the unusual Ser1/Cysϩ1 combination (Fig.  9). This intein has a native Cys1/Serϩ1 combination. One reason to test this intein was the CSXXH signature in its block B/N3 motif analogous to the GOS-TerL intein. Furthermore, the gp41-1 intein is the split intein with the highest kinetic rate reported so far (27). Thus, residual activity with a Ser1/Cysϩ1 combination should be observable more easily than with other inteins. First, we mutated the native Cys1 to serine. The right panel in Fig. 9b shows that no splice product formation could be detected with the resulting Ser1/Serϩ1 combination (C1S mutation). Interestingly, after prolonged periods of time, substantial amounts of N-cleavage product (MBP) were generated, indicating that the N-O acyl shift is at least possible. This reaction was abrogated in a control construct with a C1A mutation, which exhibited only slow C-cleavage (Fig. 9b, left panel), consistent with an earlier study (27). On the other hand, a Sϩ1C mutation to generate the Cys1/Cysϩ1 combination resulted only in a little decrease of splicing efficiency. Low levels of N-terminal cleavage product indicated that the intein is not perfectly optimized for this combination, although the thiol side chain of Cysϩ1 should be the better nucleophile than the hydroxyl of Serϩ1. Finally, the Ser1/Cysϩ1 combination was found to be virtually inactive with only trace amounts of splice product being observable. Significant amounts of N-terminal cleavage product indicated that a linear ester could be formed, but not be further processed in a directional manner (Fig. 9b,  right panel). These findings underline once more how specialized even the most active inteins are toward their nucleophilic side chains at positions 1 and ϩ1. Other factors must be important to make the Ser1/Cysϩ1 combination work.

Intein with Unprecedented Nucleophile Combination
The GOS-TerL Intein Tolerates a Truncation of Its Int N Fragment and Is Capable of Cross-splicing with the AceL-TerL N Fragment-We noticed that the sequence similarity between the Int N pieces of the GOS-TerL and AceL-TerL inteins is very high in the last 11 amino acids of the shorter AceL-TerL intein, i.e. within the N2 motif. The GOS-TerL N fragment has 8 or 9 amino acids ((S)FNERKFNE) as a C-terminal extension beyond this conserved block (Fig. 2). Furthermore, the sequence similarity is very low within the first 10 amino acids of the two Int N pieces, including Ser1 of the GOS-TerL intein and Cys1 of the AceL-TerL intein. These observations led us to perform truncation experiments of the GOS N fragment and cross-splicing assays between both inteins.
We could show that the model protein MBP-Int N (⌬9aa)-His 6 with a C terminally truncated variant of the GOS N fragment was indeed capable of protein splicing with kinetics and yields indistinguishable from those of the wild-type construct (data not shown). Thus, the C-terminal amino acids of the GOS N fragment seem to be superfluous at least under these experimental conditions. This finding also suggests that the split site of the GOS-TerL intein corresponds to that of the AceL-TerL intein, e.g. in the loop sequence before ␤-sheet 4 of the conserved HINT-fold (44,45).
Next, we tested exchanging of the Int N fragments of the GOS-TerL and AceL-TerL inteins in cross-splicing experiments. Very surprisingly, the GOS C construct was also capable of splicing with the heterologous AceL N peptide pep-C1 (Fig.  10a). Even more strikingly, the GOS C could also form a splice product with the pep-C1S peptide (Fig. 10a). In contrast, the AceL C protein could only splice with its corresponding AceL N partner peptide or protein (Fig. 10b), but with neither the fulllength nor the C terminally truncated GOS N proteins (Fig. 10, c  and d). The restricted activity of the AceL-TerL intein was independent from the nature of the residue at position 1, as AceL C was also incapable of splicing with the GOS N (S1C) or shortened GOS N (S1C, ⌬9aa) mutants (Fig. 10, c and d). The promiscuity of the GOS C fragment toward both AceL N fragments is truly remarkable, given the diversity of the sequences within the first 10 amino acids that are expected to be a crucial part of the refolded intein complex. These results suggest that the exact sequence of the GOS N fragment is not important for the ability of the intein to splice with a Ser1 residue in general and with the unique Ser1/Cysϩ1 combination in particular. Rather, the special features that enable this intein for this unusual mechanism should be located within the GOS C part.

Conclusions
The newly discovered GOS-TerL intein is the first and only intein described so far with the unexpected Ser1/Cysϩ1 combination of residues at the two splice junctions. Also artificial introduction of this combination in other inteins was not reported yet to result in efficient protein splicing (17). As shown in this work, this also holds true for the gp41-1 and Ace-TerL inteins that naturally employ Cys1/Serϩ1 and Cys1/Cysϩ1 combinations, respectively. These two inteins represent the fastest intein known and the closest homolog of the GOS-TerL intein, respectively.
Collectively, our mutational analysis of the GOS-TerL intein clearly supports the notion that its splicing pathway follows that of a typical class 1 intein in terms of the overall logic of the reactive intermediates and the participating key residues. Thus, the upstream extein N -intein peptide bond is first rearranged into the oxoester, which is then converted into the thioester of the branched intermediate, before cleavage of the downstream intein-extein C peptide bond irreversibly progresses the splicing mechanism beyond the potential equilibrium between the starting material and the two ester intermediates. Liberation of the N-terminal amine group of the extein C also enables rearrangement into the new extein N -extein C peptide bond (Fig. 1b).
Why is the Ser1/Cys1ϩ1 combination so unusual in the protein splicing pathway? One could speculate that in the evolution of inteins this combination was avoided because the shift from an oxoester as the first intermediate to a thioester as the second intermediate is an uphill reaction in a thermodynamic sense. The intein would have to catalyze two consecutive uphill steps starting from the peptide bond at the upstream splice junction as the ground state. In contrast, in all other previously observed combinations the transesterification step to the branched intermediate is either formally a downhill reaction (in Cys1/Serϩ1 and Cys1/Thrϩ1 combinations) or at least a reaction on the same energetic level (in Cys1/Cysϩ1, Ser1/Serϩ1, and Ser1/Thrϩ1 combinations). Following the initial uphill reaction from the peptide bond to the linear oxoester or thioester intermediate, a downhill or equal level transesterification FIGURE 10. Cross-splicing between the GOS-TerL and AceL-TerL inteins. The wild-type fragments from both inteins as well as specific variants were mixed in cross-combinations as indicated. See Fig. 7 for the investigation of the C1S mutations in the AceL-TerL intein. Shown are SDS-PAGE gels stained with Coomassie Brilliant Blue and partly illuminated under UV-light (in a). Reactions with the GOS-TerL-Int C and AceL-TerL-Int C fragments were carried out at the optimal temperature of the inteins of 30 and 8°C, respectively. Reactions were performed with an excess of the respective Int N (45 M) over the Int C fragments (15 M). a, AceL-TerL N with GOS-TerL C to address cross-splicing and C1S mutation. b, AceL-TerL N with AceL-TerL C to test comparable Int N construct of the AceL-TerL intein. c, GOS-TerL N with Ace-TerL C to investigate cross-splicing and S1C mutation. d, GOS-TerL N with GOS-TerL C to address effect of deletion and S1C mutation. Fl, 5(6)-carboxyfluorescein; *, protein contamination. reaction may be important to control the desired flux in the equilibrium of intermediates along the reaction coordinate of the protein splicing pathway. A second uphill reaction may be disfavored for this purpose and block splicing. Notably, also the splicing pathways of class 2 and class 3 inteins contain only one formal uphill reaction (11).
Of course, each intein may simply be highly specialized for the activation of its particular catalytic cysteine, serine, or threonine residues. It is conceivable that in an optimized local environment the reactivity and positioning of the catalytic side chains as well as the acyl intermediates may be tuned for each of the mentioned combinations to off-set the energetic differences. However, the very uneven distribution of combinations of the 1 and ϩ1 residues in natural inteins as well as mutagenesis studies clearly indicate that the Ser1/Cysϩ1 combination is strongly disfavored. Thus, we propose that the GOS-TerL intein either modulates activities of the nucleophiles by an optimized microenvironment or employs very efficient strategies to irreversibly remove the branched thioester intermediate, i.e. by an efficiently coupled asparagine cyclization.
The GOS-TerL intein tolerates well the substitution of the catalytic Ser1 to cysteine (resulting in the Cys1/Cysϩ1 combination). In contrast, it is highly specialized on the Cysϩ1 nucleophile, because the Ser1/Serϩ1 combination is inactive. Even the combination Cys1/Serϩ1 is virtually inactive, although it is found in many other inteins.
Notably, mutations from Ser to Cys at both splice junctions have yielded active inteins, as shown here for the GOS-TerL and the gp41-1 inteins, respectively. Mutations from Cys to Ser/Thr have also been reported to work at the ϩ1 position (46,47), although they yield nearly inactive inteins in most cases (28). However, a Cys to Ser mutation at the 1 position that retained high splicing efficiency has not been reported to our knowledge. Interestingly, this mutation can give rise to high yields of N-terminal cleavage, as shown here, for example, by the gp41-1 intein, indicating that the N-O shift can still be performed. These findings point to the transesterification reaction being the critical step in artificial Ser1/Cysϩ1 or Ser1/Serϩ1 combinations, either because of non-optimized electronic activation of the reaction partners or because of their slightly mispositioned arrangement in the active site. Alternatively, the transesterification step may occur in these cases, but the branched intermediate is hydrolyzed before asparagine cyclization can take place.
The amino acids in block B/N3 are important for the initial N-S or N-O shift. The His and Thr (or Ser) in the TXXH motif have been found to be critical or important in many inteins. Interestingly, in some cases one of these residues can be mutated to Ala without substantial loss of activity (34,37). For the GOS-TerL intein with the native Ser1/Cysϩ1 combination both Ser-87 and His-90 were found to be indispensable. Interestingly, however, in the artificial Cys/Cysϩ1 combination, Ser-87 of block B/N3 could be mutated to Ala, whereas His-90 was still indispensable. This finding is in agreement with the idea that the Ser1-oxoester is chemically less activated than the Cys1-thioester. Therefore, formation or stabilization of the Ser1-oxoester should require more catalytic assistance from the intein. The Thr or Ser at the conserved position of the Ser-87 in the GOS-TerL intein have previously been suggested to help in distorting the scissile peptide bond at the upstream splice junction to facilitate the N-S acyl shift (41). A similar role in the N-O shift as well as a role in preparing the oxoester for the transesterification reaction is conceivable for the GOS-TerL intein.
Strikingly, the GOS C fragment forms a highly active intein with the AceL N fragment whose sequence of the N1 motif is substantially different. Also a C1S mutation in the AceL N fragment, which reconstitutes the Ser1/Cysϩ1 combination, yields an active AceL N /GOS C hybrid intein system. Thus, the most important determinants for the usual splicing pathway of the GOS-TerL intein seem to be located in its Int C fragment. These remain to be identified in future studies to further unravel the enzymatic mechanism for the unprecedented sequence of acyl shifts.
The fact that the GOS N fragment is free of cysteine residues makes this intein an interesting candidate for certain biotechnological applications, such as protein labeling approaches (48 -50). The small size of this fragment of 37 or 28 aa also calls for applications using semisynthetic protein trans-splicing (23,25,51,52).
In conclusion, the GOS-TerL intein can master the unprecedented splicing pathway from peptide bond to oxoester to thioester, with seemingly two steps uphill with respect to the energy levels of the intermediates. Its splicing mechanism thus represents an intriguing diversion from the canonical pathway of class I inteins.
Author Contributions-A. L. B. and H. D. M. conceived the study, analyzed the data, and wrote the paper. A. L. B. performed all experiments except mass spectrometry.