Protein Splicing in the Absence of an Intein Penultimate Histidine*

Protein splicing is a self-catalytic process in which an intervening sequence, termed an intein, is excised from a protein precursor, and the flanking polypeptides are religated. The conserved intein penultimate His facilitates this reaction by assisting in Asn cyclization, which results in C-terminal splice junction cleavage. However, many inteins do not have a penultimate His. Previous splicing studies with 2 such inteins yielded contradictory results. To resolve this issue, the splicing capacity of 2 more inteins without penultimate His residues was examined. Both the Methanococcus jannaschii phosphoenolpyruvate synthase and RNA polymerase subunit A′ inteins spliced. Splicing of the phosphoenolpyruvate synthase intein improved when its penultimate Phe was changed to His, but splicing of the RNA polymerase subunit A′ intein was inhibited when its penultimate Gly was changed to His. We propose that inteins lacking a penultimate His (i) arose by mutation from ancestors in which a penultimate His facilitated splicing, (ii) that loss of this His inhibited, but may not have blocked, splicing, and (iii) that selective pressure for efficient expression of the RNA polymerase yielded an intein that utilizes another residue to assist Asn cyclization, changing the intein active site so that a penultimate His now inhibits splicing.

Protein splicing is a post-translational event analogous to RNA splicing that involves the removal of an internal protein fragment (intein) from a precursor protein and the joining of the two flanking sequences (exteins) to produce an active extein protein (1) . As of January 2000, 100 putative inteins have been identified in all three phylogenetic domains (see the Intein Registry in InBase (2)). Inteins have 2 structural domains ( Fig.  1A) as follows: a splicing domain that is composed of the Nterminal and C-terminal splicing regions, and a central region encoding a homing endonuclease (3,4) or a small linker. The structure of the intein splicing domain is conserved among inteins, forming a protein fold termed the HINT module (5)(6)(7)(8). The homing endonuclease domain is dispensable. Residues mediating or assisting the protein splicing reaction have been identified by sequence comparison, mutation, and structural analysis (6, 8 -19) (Fig. 1A).
The protein splicing reaction requires the intein splicing domain plus the first amino acid of the C-extein. The selfcatalytic protein splicing mechanism (reviewed in Noren et al. (20)) consists of the following four steps, each involving nucleophilic displacements (Fig. 2): (i) formation of a linear (thio)ester by an acyl rearrangement of the conserved Cys or Ser at the intein N terminus; (ii) formation of a branched intermediate by transesterification when the thiol/hydroxyl group of the Ser, Thr, or Cys at the beginning of the C-extein attacks the (thio)ester from step 1, resulting in transfer of the N-extein to the side chain of the first C-extein residue; (iii) resolution of the branched intermediate by cyclization of the intein C-terminal Asn or Gln, resulting in cleavage of the peptide bond between the intein and the C-extein; and (iv) formation of a native peptide bond between the exteins by a spontaneous S-N (or O-N) acyl rearrangement. Dead-end side reactions are often observed when the intein is expressed in a heterologous extein. N-terminal splice junction cleavage products are formed by cleavage of the (thio)ester at either the N-terminal splice junction or at the branch point in the branched intermediate. Cterminal splice junction cleavage results when Asn or Gln cyclization precedes steps 1 or 2. Although inteins are not true enzymes in the sense that they do not act on multiple substrates, they use the same mechanisms as enzymes to facilitate splicing, including oxyanion holes at each splice junction (6,20). This paper focuses on the role of the intein penultimate residue, which is thought to facilitate C-terminal splice junction cleavage and Asn cyclization when His is present at this position. Inteins must increase the rate of Asn or Gln cyclization, since Asn cyclization takes several days at 37°C, pH 7.4, in model peptides, and cyclization of Gln is even slower due to entropic factors (21). Moreover, in both proteins and peptides, the preferred reaction is the attack by the main chain nitrogen on the side chain carbonyl leading to deamidation, as opposed to the protein splicing reaction in which the side chain amide nitrogen attacks the main chain carbonyl (21). Inteins can increase the rate of Asn or Gln cyclization and direct the reaction toward peptide bond cleavage by aligning reactive groups and rendering the peptide bond more labile by generating an electrophilic center at the carbonyl carbon. The increased electrophilicity could be accomplished by (i) hydrogen bonding to the carbonyl oxygen, thereby stabilizing the developing negative charge on the tetrahedral intermediate of the cyclization reaction, and/or (ii) making the main chain amine a better leaving group by donating a proton. Inteins could also facilitate cyclization by increasing the nucleophilicity of the side chain amide group. The intein penultimate His is thought to assist Asn cyclization. Most inteins have a His in the penultimate position (2), and mutation of this His inhibits or blocks splicing (10,11,18). The crystal structures of the Mycobacterium xenopi gyrase subunit A (GyrA) 1 intein (6) and the Sac-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 1. A, organization of a typical protein splicing precursor. A precursor is depicted with an intein composed of a homing endonuclease domain (black box) separating the N-terminal and C-terminal splicing regions (white boxes) that form the intein splicing domain. Mini inteins have a short linker instead of an endonuclease domain. Intein motifs are shown above the precursor, and conserved residues in selected blocks are shown below the precursor. Block A contains the conserved Ser or Cys at the N terminus of the intein, although Ala is present at this position in 3 intein families. Block G contains the conserved dipeptide His-Asn at the intein C terminus and Ser, Thr, or Cys at the beginning of the C-extein. Three inteins have Gln instead of Asn in block G (2,29). B, intein polymorphism at the penultimate residue. As of January 2000, 10 families of intein alleles lack a penultimate His. The number in parentheses is the total number of residues in each intein. Abbreviations are as follows: Ceu, C. eugametos; CIV, Chilo iridescent virus; Aae, Aquifex aeolicus; Spb, Bacillus subtilis SP ␤ phage; Ssp, Synechocystis sp. PCC6803; Pab, Pyrococcus abyssi; Pho, Pyrococcus horikoshii OT3; Pfu, Pyrococcus furiosus; RIR1, ribonucleoside-diphosphate reductase, ␣ subunit; KlbA, kilB operon orfA; LHR, large helicase-related protein; Lon, ATP-dependent protease LA; Moaa, molybdenum cofactor biosynthesis homolog. C, schematic representation of the Rpol AЈ and PEP MEIEP precursors. Splice junction residues, the position of the intein penultimate amino acid (arrow), and the number of extein residues are indicated. Native extein sequences are represented by E, and H represents a 6-aa His tag.

FIG. 2.
The self-catalytic protein splicing reaction. The protein splicing mechanism is depicted with X representing either the oxygen or the sulfur present in the side chain of Ser, Thr, or Cys. In some inteins Asn is replaced by Gln, which can similarly cyclize. All tetrahedral intermediates, assisting groups and proton transfer steps are omitted for clarity. AЈ intein or the Mja PEP intein plus varying lengths of native N-and C-extein sequences (E) to generate MEIEP fusions (Fig. 1C). DNA encoding the M. jannaschii inteins was amplified by polymerase chain reaction (PCR) from genomic DNA using Mja Rpol AЈ primers, 5Ј-ATG CTC GAG CAC CAA ACT TAA TTA TTG AGG AT and 5Ј-ATT GAG GCC TTC AGC GTT GTA GGG CGG GCA ATT or Mja PEP primers 5Ј-ATG CTC GAG ACG ATG TTT GTT AAG GAT GAA AAA and 5Ј-ATT GAG GCC TCT AGA AAC GAT TGC CGC GTG GCA GTT. The Mja Rpol AЈ fragment included 200 native N-extein residues and 5 native C-extein residues. The Mja PEP intein fragment included 150 native N-extein residues and 7 native C-extein residues. The resultant PCR fragment was digested with XhoI and StuI, purified from agarose gels with a QIAEX II gel extraction kit (Qiagen Inc.) and ligated into gelpurified XhoI/StuI cut pMIP21 vector DNA. All mutants were constructed by substituting a restriction enzyme fragment in pMEIEP with a PCR fragment generated using the following primers containing the desired mutations: PEP intein F411H primers 5Ј-AAA TGG AAA CCA ATA AGG GT and 5Ј-GCG GCC CTA AAA CGA TTG CCG CGT GGC AGT TAT GTA CAA CTA TTGG; Rpol AЈ intein G451H primers 5Ј-GAA TTT GGT ATT GAA TTA AAG and 5Ј-ATT GAG GCC TTC AGC GTT GTA GGG CGG GCA ATT GTG TGT TAA AAA GCC; Rpol AЈ intein primers 5Ј-AAA GAG AAA TGC CTT AA GAA TGGA and 5Ј-CCT TCA GCG TTG TAG GGC GGG CAA TTX XXT GTT AAA AAG CC, in which XXX was TGC for G451A, AAA for G451F, TTT for G451K, and CCA for G451W. A His 6 tag was incorporated at the end of paramyosin by insertion of a double-strand oligonucleotide into the SalI and PstI sites in the PEP intein clone and the SalI and HindIII sites in the Rpol AЈ intein clone.
The sequence of all PCR fragments was confirmed after sequencing both DNA strands by the NEB DNA sequencing core facility. All enzymes were obtained from New England BioLabs and used as described by the manufacturer.
MEIEP Production and Characterization-E. coli strain TB1 (New England BioLabs) cells containing pMEIEP plasmids were grown at 30°C to mid log phase and then further incubated in the presence or absence of 0.4 mM isopropyl-1-thio-␤-D-galactopyranoside at 15°C overnight. Nickel column (Qiagen Inc.) purified proteins containing the C-terminal His tag were prepared as described by the manufacturer with the addition of 10% glycerol in all buffers. Induction of MEIEP precursors in the presence of isopropyl-1-thio-␤-D-galactopyranoside resulted in increased proteolytic cleavage within the intein and little or no increase in precursor, spliced products, or single splice junction cleavage products. Therefore, experiments were performed with purified samples from uninduced cultures.
MEIEP precursors, spliced products, and cleavage products are distinguishable by relative mobility in Coomassie Blue-stained SDS-PAGE and by immunoreactivity in Western blots using anti-MBP and antiparamyosin sera (24). In order to obtain clear separation of MEEP and IEP, samples were electrophoresed in 8% acrylamide SDS-PAGE, and the 30-kDa EP product was run off the bottom of the gel. The same samples were also electrophoresed in 12% acrylamide SDS-PAGE and stained with Coomassie Blue for analysis of the EP product (data not shown). Coomassie Blue-stained gels were digitized with a Microtek Scanmaker III and quantitated with NIH Image 1.51 software as described (25). The values for at least 3 independent experiments were averaged for each sample and the standard deviation of the means calculated. The increase or decrease in molar percent of spliced product (MEEP) or N-terminal splice junction cleavage product (IEP) was calculated as follows: percent increase ϭ (mutant Ϫ wild type)/wild type ϫ 100 and percent decrease ϭ (wild type Ϫ mutant)/wild type ϫ 100.
N-terminal Sequencing by Edman Degradation-Purified Mja Rpol AЈ MEIEP samples were sequenced as described (24). Briefly, protein samples were subjected to electrophoresis on a 8% Tris glycine polyacrylamide gel (NOVEX) and transferred to a ProBlott polyvinylidene difluoride membrane (PE Biosystems). The membrane was stained with Coomassie Blue R-250, bands corresponding to the branched intermediate (MEIEP*); the MEIEP precursor and the MEEP spliced product were excised, and each was subjected to sequential Edman degradation. The data were acquired and analyzed on an Applied Biosystems 610A Data System.

Splicing of the Mja PEP and Mja Rpol AЈ Inteins in E.
coli-The well studied MIP in vitro protein splicing system (18, 24 -26) was used to examine whether a penultimate His is required for splicing of the Mja Rpol AЈ and PEP inteins. The MIP precursor consists of an intein (I) inserted in-frame be-tween the E. coli maltose-binding protein (MBP or M) and the ⌬ Sal fragment of Dirofilaria immitis paramyosin (P). In this study, native extein sequences (E) were included to improve splicing by better mimicking the native precursor active site, generating the MEIEP fusions (Fig. 1C). The Mja PEP intein has a penultimate Phe at position 411 and the Mja Rpol AЈ intein has a penultimate Gly at position 451. Size, immunoreactivity, and in some cases N-terminal protein sequencing were used to identify precursor (MEIEP), spliced product (MEEP), branched intermediate (MEIEP*), and N-terminal splice junction (ME ϩ IEP) or C-terminal splice junction (MEI ϩ EP) cleavage products. Cleared cell lysates were chromatographed over nickel-chelating columns, resulting in purification of proteins containing the C-terminal His tag and loss of ME and MEI. The Mja PEP and Rpol AЈ MEIEP precursors spliced efficiently in E. coli (Fig. 3), with 60% spliced MEEP product observed with the Mja PEP intein and 68% with the Mja Rpol AЈ intein (Table I). No C-terminal splice junction cleavage product (EP) was observed when samples were electrophoresed in 12% SDS-PAGE (data not shown). No increase in spliced product was observed if protein samples were incubated overnight in vitro at 16 or 37°C at pH 6.0 to 8.5 (data not shown). It is not known why some proportion of precursors usually fail to splice in heterologous systems, although it has most often been attributed to misfolding or aggregation.

Protein Splicing of Mja PEP and Mja Rpol AЈ Inteins in MEIEP after Mutation of the Penultimate Intein Residue to
His-Mutation of the Mja PEP intein penultimate Phe and the Mja Rpol AЈ intein penultimate Gly to His was performed to see if "reversion" to this normally conserved residue would improve splicing. There was a 43% increase in spliced MEEP product obtained from the F411H mutant Mja PEP intein samples as compared with the wild type samples (Fig. 3 and Table I). However, replacing the Mja Rpol AЈ intein penultimate Gly with His resulted in a 27% decrease in spliced product, a 57% increase in IEP N-terminal splice junction cleavage product, and an accumulation of a slowly migrating protein; amino acid sequencing of the latter protein indicated that it was the branched intermediate, since it had the predicted pair of amino acids present in each of 15 cycles of Edman degradation (Table  II). One amino acid in each cycle corresponded to the MBP sequence, and the other residue in that cycle corresponded to the Mja Rpol AЈ intein sequence. The identities of the Mja Rpol AЈ precursor (MEIEP) and spliced product (MEEP) were also confirmed by protein sequencing (data not shown). The accumulation of branched intermediate and N-terminal splice junction cleavage products is indicative of inhibiting only the Asn cyclization step and not the acyl shift or transesterification steps (Fig. 2).
Further Substitutions of the Mja Rpol AЈ Intein Penultimate Residue-The Mja Rpol AЈ intein penultimate residue, Gly-451, was replaced with Ala, Lys, Phe, or Trp in the MEIEP context ( Fig. 4 and Table III). Ala, Lys, and Phe substitutions resulted in a similar decrease in spliced product (ϳ27%) as in the G451H mutation. G451H and G451F resulted in an increase in N-terminal splice junction cleavage products and branched intermediate, suggesting that His and Phe substitutions only inhibited Asn cyclization. The bulky penultimate Trp residue resulted in the largest reduction in spliced product (59%) but still allowed N-terminal splice junction cleavage and the accumulation of branched intermediate. No C-terminal splice junction cleavage product (EP) was observed in any sample (data not shown). DISCUSSION A His is present at the penultimate position in 83% of inteins (2). Mutagenesis and structural studies have demonstrated the importance of this residue in facilitating Asn cyclization and protein splicing (6, 8, 10 -12, 18). Contradictory results were obtained when the first two inteins naturally lacking a penultimate His were tested for the ability to splice. The C. eugametos ClpP intein with a penultimate Gly (19) failed to splice in E. coli, whereas the Synechocystis sp. PCC6803 DnaE intein with a penultimate Ala did splice (22). The present study examined two more inteins that lack a penultimate His. Both the M. jannaschii PEP and Rpol AЈ inteins with a Gly or Phe at the penultimate position yielded Ͼ60% spliced product in E. coli despite the fact that neither Gly nor Phe can chemically participate in Asn cyclization. In experiments to be published elsewhere, 2 we have also demonstrated splicing of the Pyrococcus sp. GB-D KlbA intein, which has a penultimate Ser. To date, half of the intein families that lack a penultimate His have been examined, and all but 1 are capable of splicing. We therefore suggest that the failure of the C. eugametos ClpP intein to splice in E. coli (19) is the exception, rather than the rule, and that most inteins that lack a penultimate His will probably be active. It is possible that the C. eugametos ClpP intein splices in its native organism, since there is precedent for failure of active inteins to splice in E. coli (27).
The splicing reaction requires that the two splice junctions are in close proximity. Inteins are thought to have a complex active site with two oxyanion holes as follows: one at the Cterminal splice junction that facilitates Asn or Gln cyclization and includes the intein penultimate His, and a second oxyanion hole at the N-terminal splice junction that facilitates the first two nucleophilic displacements and includes residues in intein block B (6,20). Several substitutions in the penultimate position of the Mja Rpol AЈ intein had little effect on reactions at the N-terminal splice junction, since N-terminal splice junction cleavage products were observed and the branched intermediate accumulated. However, all of these mutations inhibited Asn cyclization as evidenced by the failure to rapidly resolve the branched intermediate and the absence of C-terminal splice junction cleavage products. Although these data support the presence of separable oxyanion holes at each splice junction, 2 M. W. Southworth and F. B. Perler, manuscript in preparation.    (19), the Mja Rpol AЈ intein is inhibited by a penultimate His. A penultimate His in the Mja Rpol AЈ intein may cause steric inhibition at the intein active site or block access to the C-terminal splice junction by a new residue(s) that facilitates Asn cyclization. Four more mutations were made at Gly451 ranging from the second smallest amino acid, Ala, to the largest amino acid, Trp. Substituting Gly with Ala yielded a similar reduction in spliced product as His, Lys, and Phe, indicating that either packing at the C-terminal splice junction is so tight that substituting Gly with the slightly larger Ala will cause the same steric effect as substituting with Phe or that the conformational flexibility that Gly provides is critical for aligning active site residues at the Mja Rpol AЈ intein C-terminal splice junction.
We suggest that all inteins evolved from ancestors that had a penultimate His that facilitated Asn or Gln cyclization. Mutation of the intein penultimate His might have yielded an intein that still spliced, although less efficiently (unless other compensatory mutations occurred simultaneously). Asn or Gln cyclization would then become rate-limiting. Enough active extein might still be synthesized to permit survival of the host until further mutation increased splicing proficiency. In support of this hypothesis, splicing of the C. eugametos ClpP (19), the Synechocystis sp. PCC6803 DnaE (23), and the Mja PEP inteins improved when the intein penultimate residue was "reverted" to His. These inteins may yet to have fully compensated for the absence of a penultimate His. They may have recently mutated or selective pressure may be low since they may produce sufficient amounts of spliced extein despite inefficient splicing. The more essential and highly expressed the gene product, the stronger the selection for mutations that would improve splicing. Although dnaE encodes an essential protein that is part of the replicative DNA polymerase, splicing of Synechocystis sp. PCC6803 DnaE does not have to be very efficient, since only a few molecules of replicative DNA polymerase are generally required per cell (Ͻ20 molecules/cell in E. coli) (28). However, Rpol AЈ is not only an essential protein, but it is a highly expressed protein, comprising part of the archaeal RNA polymerase. Therefore, individuals that acquired mutations that improved splicing of the Mja Rpol AЈ intein would rapidly become fixed in the population. In fact, the Mja Rpol AЈ intein has changed so much that splicing is now inhibited when its penultimate residue is reverted back to His. The differences in splicing capacity of inteins that naturally lack a penultimate His may thus reflect inteins at different stages of evolving toward rapid splicing after mutation of their penultimate His. The data also suggest that splicing of inteins that naturally lack a penultimate His may improve if the native penultimate residue is replaced by His.