Structural and Dynamical Features of Inteins and Implications on Protein Splicing*

Protein splicing is a posttranslational modification where intervening proteins (inteins) cleave themselves from larger precursor proteins and ligate their flanking polypeptides (exteins) through a multistep chemical reaction. First thought to be an anomaly found in only a few organisms, protein splicing by inteins has since been observed in microorganisms from all domains of life. Despite this broad phylogenetic distribution, all inteins share common structural features such as a horseshoe-like pseudo two-fold symmetric fold, several canonical sequence motifs, and similar splicing mechanisms. Intriguingly, the splicing efficiencies and substrate specificity of different inteins vary considerably, reflecting subtle changes in the chemical mechanism of splicing, linked to their local structure and dynamics. As intein chemistry has widespread use in protein chemistry, understanding the structural and dynamical aspects of inteins is crucial for intein engineering and the improvement of intein-based technologies.

Protein splicing is a posttranslational modification where intervening proteins (inteins) cleave themselves from larger precursor proteins and ligate their flanking polypeptides (exteins) through a multistep chemical reaction. First thought to be an anomaly found in only a few organisms, protein splicing by inteins has since been observed in microorganisms from all domains of life. Despite this broad phylogenetic distribution, all inteins share common structural features such as a horseshoelike pseudo two-fold symmetric fold, several canonical sequence motifs, and similar splicing mechanisms. Intriguingly, the splicing efficiencies and substrate specificity of different inteins vary considerably, reflecting subtle changes in the chemical mechanism of splicing, linked to their local structure and dynamics. As intein chemistry has widespread use in protein chemistry, understanding the structural and dynamical aspects of inteins is crucial for intein engineering and the improvement of inteinbased technologies.
Protein splicing is a posttranslational modification, a multistep autoprocessing event, where intervening proteins (inteins) self-excise themselves from the precursors followed by the ligation of external proteins (exteins) (1). Once thought anomalies, inteins have since been found in all domains of life. Beyond their biological context, inteins are particularly intriguing as their capacity to process the polypeptide backbone makes them useful tools for protein engineering. Despite the fact that all inteins share the same fold and have highly conserved sequence motifs in their active sites, inteins have surprisingly different splicing efficiencies, and their extein sequence preferences differ dramatically. Here, we review studies on intein structure and dynamics that shed light on their complex conserved fold and their divergent efficiency and substrate specificity.

The Intein Fold
The conserved horseshoe-like fold of inteins (Fig. 1A) has been seen in all intein structures solved by NMR and x-ray crystallography (2)(3)(4)(5)(6)(7)(8)(9)(10)(11). The fold comprises primarily ␤-sheets, loops, and two short helices and has pseudo two-fold symmetry (Fig. 1B). Given this symmetry, it has been proposed that this fold arose due to a gene duplication event of some parent protein (6). The intein fold has three remarkable features: 1) the topology is complex, involving multiple passes of the polypeptide chain back and forth between the symmetry-related halves (Fig. 1B); 2) the extein-bearing termini of the intein are brought in close proximity (Ͻ10 Å) for splicing; and 3) multiple protease-like active sites composed of conserved sequence motifs are built around these termini to carry out each of the chemical steps involved in protein splicing. Folding of an intein is coupled to the reaction; the initial structure facilitates the first step of splicing, and thereafter each splicing step causes local conformational changes, affecting the fold of the catalytic apparatus and hence affecting the reaction coordinate and shifting the equilibrium position.
The splicing motifs are found in two splicing regions, the N-terminal splicing region (N-intein) and C-terminal splicing region (C-intein) ( Fig. 2A). In standard contiguous inteins (cissplicing), these two splicing regions are separated by homing endonuclease or linker sequences, whereas in split inteins (trans-splicing), the splicing regions are translated separately. In a functional intein fold, these two splicing regions, whether in contiguous or in split inteins, are entwined to bring all catalytic residues and the termini together, which is a unique folding/molecular recognition problem preceding protein splicing. Surprisingly, despite the fact that protein splicing is initiated by this folding event, few studies have addressed the folding of inteins (12,13).
The intertwined fold of fragments requires some degree of disorder in individual partners. In the Npu DnaE split intein, the C-intein (NpuC) is completely disordered, whereas the N-intein is partially disordered at its C-terminal half in isolation (NpuN C ) (12) (Fig. 1C). The folding is initiated by electrostatic charge complementarity between the disordered C-intein and the disordered C-terminal half of the N-intein, "capture," and this intermediate is stabilized further by hydrophobic interactions between the C-intein and the N-terminal half of the Nintein, "collapse" (12) (Fig. 1C). The "capture and collapse" mechanism points toward a general folding pathway of all inteins including contiguous inteins. The initiating electrostatic interactions are unique to split inteins (14), yet it is possible that the "split site" in split inteins could be a point of nucleation in contiguous inteins; in contiguous inteins with homing endonuclease (HE) domains, the folding of the homing endonuclease domain could nucleate intein folding. The intrinsic disorder seen in the Npu DnaE split intein is also seen in the related Ssp DnaE split intein, further supporting the notion that the folding pathway is conserved (13). Surprisingly, the N-and C-terminal splicing regions of different intein molecules can interact to form a domain-swapped dimer (Fig. 1D). As this type of noncovalent interaction between different split inteins creates hybrid active sites and mixed extein combinations, it has been proposed such structures may yield alternative spliced products that could expedite evolution by sampling alternate phenotypes under a constant genetic background (3).

The Structural Basis of Protein Splicing
Protein splicing is a multistep process (Fig. 2B). The first step is N-S/O acyl shift initiated by the N-terminal intein residue, generally a cysteine or serine (15,16), acting on Ϫ1 residue carbonyl carbon (1,17), resulting in linear (thio)ester intermediate. The next step is the trans-(thio)esterification caused by a nucleophilic attack by C-extein ϩ1 residue, which is a cysteine, serine, or threonine. The resultant branched (thio)ester intermediate is resolved by cyclization of the conserved C-terminal asparagine of intein. Once the succinimide forms, the intein is excised from the exteins. Finally, the linkage between the exteins rearranges from a (thio)ester to an amide in an inteinindependent manner, and the C-terminal succinimide of the intein slowly hydrolyzes (18). All these sequential steps require multiple active sites with distinct residues all in close proximity to the intein termini. These active-site residues are conserved among most inteins, but several exceptions exist. Below, we discuss the structural details of each step in protein splicing and discuss the unique ways in which some divergent inteins accommodate mutations in the conserved splicing motifs.

N-to-O/S Acyl Shift
This step is carried out by the N-terminal cysteine or serine of the intein, which attacks the C-terminal extein residue at its carbonyl carbon. The attack is facilitated by the Block B (TXXH) threonine and histidine and the Block F aspartate (Fig.  2C). The Block F aspartate serves as a hydrogen bond donor and stabilizes C1 thiolate (19) and is a strict requirement for N-O/S acyl shift reaction (20). Block B histidine is absent only in the Thermococcus kodakaraensis Tko CDC21-1 intein, and the Tko CDC21-1 intein circumvents the lack of Block B histidine by using an alternate mechanism facilitated by Lys 58 , which is out of the conserved motifs (21). This first step of splicing, N-O/S acyl shift, requires a substantial challenge as peptide bond cleavage in general has a large kinetic barrier. It is possible that the energy gained from folding into a very stable conformation is coupled directly to overcoming the energy barrier for the initial splicing step. Such a mechanism is observed in the SEA (named after its initial identification in a Sperm protein, in Enterokinase, and in Agrin (50)) domains of human MUC1, where the autoproteolysis is catalyzed by conformational stress (22). In the Ssp DnaE intein structure, the exteins are tightly packed (11), and upon intein assembly, they may create a steric clash at the active site so that the splicing reaction may be the most kinetically accessible escape mechanism from this clash.  (9). The N-intein (blue) and C-intein (red) are shown. B, secondary structure topology map of Drosophila melanogaster Hog domain, a relative of inteins (PDB 1AT0) (6). The pseudo two-fold symmetry axis is highlighted with a star. C, capture and collapse folding mechanism of the Npu DnaE intein (12). The N-intein is blue (dark blue for NpuN N and light blue for NpuN C ), the C-intein is red, and the extein-bearing termini are marked by orange circles. D, three-dimensional domain swapped dimer of a contiguous Npu DnaE intein variant (3).
Also, the structure of the Ssp DnaE intein caught in a redox trap (4), supplemented by further computational and crystallographic evidence (5), shows that the initial rearrangements of residues and the first step, N-O/S acyl shift, are accelerated by localized structural strain on the Ϫ1 N-extein residue, mediated by the conserved Block B threonine, which makes bonding interactions with C1 and the Ϫ2 N-extein residue. Additionally, some inteins lack a Block A nucleophile; instead they have an alanine or a proline. Several studies have shown a twisted or destabilized N-terminal scissile peptide bond in the precursor protein (5,23). These inteins directly form a typical branched intermediate upon N-terminal activation (24), or in some cases, they use a Block F cysteine to form a different branched (thio-)ester intermediate before the canonical branched intermediate (25). The Mja KlbA intein, with an alanine at N terminus, has a peptide bond in cis-conformation possibly making it unstable or increasing its susceptibility to a nucleophilic attack (7).

trans-(Thio)esterification
This step is the least understood step in protein splicing because it is probably the most difficult step to investigate experimentally (the linear (thio)ester intermediate cannot be isolated). trans-(Thio)esterification requires deprotonation of Cys/Ser/Thrϩ1 side chain and nucleophilic attack to the linear (thio)ester linkage at Cys/Ser 1 , which are far apart in high-resolution structures (9 -10 Å) (11,26,27) (Fig. 2D). In Mtu RecA, Block F aspartate together with Cϩ1A mutation showed no N-cleavage activity, and Block F aspartate mutated to residues that are not hydrogen bond-capable showed similar results, implying that Block F aspartate accomplishes its role in trans-(thio)esterification through hydrogen bond interactions (28), which was also supported by NMR-based pK a studies (19). All these studies indicate that the first two steps of protein splicing are achieved by a complex hydrogen-bonding network of three conserved residues, N-terminal cysteine/serine, Block B histidine, and Block F aspartate (29).

Branched Intermediate Resolution
Branched intermediate (BI) 2 resolution is irreversible and achieves the cleavage of the peptide bond between the intein and the C-extein. The side chain nitrogen of C-terminal asparagine performs a nucleophilic attack on the carbonyl carbon of the peptide bond to form a succinimide, subsequently cleaving the intein from BI (30,31). Indeed, Asn side chain nucleophiles are biochemically rare (also found in N-glycosylation) (32), and succinimide formation in proteins more commonly occurs by nucleophilic attack of a backbone amide nitrogen to an Asn side chain carbonyl carbon, resulting in deamidation of the side chain (33). Both structural and computational studies revealed that Block F and Block G (penultimate) histidines are important in BI resolution; they activate asparagine and protonate the forming amine (18) (Fig. 2E). Interestingly, some inteins lack a penultimate histidine (34). In DnaE split inteins, this histidine is either a serine or an alanine (35). It is not clear how these divergent inteins can circumvent the need for two histidines and carry out splicing reaction. However, there are other proximal residues that may serve as general acids or bases during splicing. Interestingly, although His 48 in DnaE inteins has not been implicated in splicing, it is close to the active site and may substitute Block G histidine (Fig. 2F). Although not apparent in crystal structures, it is presumed that each splicing step causes local conformational changes, affecting the reaction coordinate, shifting the equilibrium position.

Extein Dependence on Protein Splicing
It is intriguing that hundreds of different inteins can carry out the same chemistry but with different extein specificity. Different inteins have evolved to splice different proteins in nature, and thus their extein sequence preferences are roughly defined by their native contexts. Deviation from these prefer- 2 The abbreviation used is: BI, branched intermediate.  (19). F, an alternate mechanism seen in DnaE inteins, lacking penultimate histidine, is shown on the Npu DnaE intein (PDB 2KEQ) (10).
ences often results in a profound effect on splicing kinetics and yield, suggesting that the exteins can contribute either chemically or structurally to the intein active site. For example, in the Npu DnaE intein, the splicing is minimally affected by local N-extein sequence, but the variation of the local C-extein sequence dramatically affects splicing efficiency (37-39) by perturbing branched intermediate resolution (40). In the structure of the orthologous Ssp DnaE intein, the native Pheϩ2 side chain packs against and stabilizes Block F histidine (Fig. 3A). This conserved catalytic histidine lies on a flexible loop and has been implicated as a general base or acid in the BI resolution step in many inteins (11,18). The flexible loop containing Block F histidine is modulated by the C-extein composition and has been used to engineer Npu into a more extein-tolerant intein. The mutation, D124Y, on this loop stabilizes and orients His 125 to a catalytically favorable pose, hence reducing the Pheϩ2 extein dependence on Npu (36). In a computational study on the Mtu RecA intein, where quantum mechanics/molecular mechanics methods were used and the local extein electronic structure of various ϩ1 residue mutants was investigated, it was found that the mutants have different electron affinities and ionization potentials imposing different energy barriers (41). In the Ssp DnaB mini-intein, analysis of the crystal structure explains why the Ϫ1 position can only be occupied by a Gly. Modeling suggests that any bigger residue would abrogate splicing by causing steric clashes (42) (Fig. 3B). Additionally, in the Pho RadA intein, it was shown that the size and charge of the side chain have an impact at position Ϫ1 due to electrostatic and packing interactions with the active site; negatively charged amino acids, ␤-branched amino acids, proline, and amino acids with small side chains have lower splicing efficiency (43) (Fig. 3,  C and D).

Distal Mutations Affect Protein Splicing
Using mutagenesis strategies, enhancing/activating mutations distal from the active site have been observed, suggesting that the overall structure and dynamics of the intein fold are somewhat plastic (Fig. 4). For example, in DnaE split inteins, several residues that are distant from the active site have profound effects on splicing efficiency; these residues either stabilize the global fold or stabilize and orient the active-site splicing motifs (39). The fast split inteins of this family prefer an aromatic residue at position 56, which is adjacent to the conserved TXXH motif and may facilitate packing interactions and stabilize the TXXH motif. A preferred glutamate at position 89 takes part in an ion cluster shown to be important in stabilizing the intein complex (14), whereas another glutamate is preferred at position 122 close to Block F histidine. Additionally, Glu 23 is important for stability and dynamics and has been shown to be an activating point mutation (44,45). In the minimized Mtu RecA intein, a mutation distal from the active site, V67L, enhances splicing by modifying the active site through improving packing interactions in an internal hydrophobic core (20) and reducing the ensemble distribution (45). This single V67L mutation minimally affects the crystal structure (20) but has profound effects on global dynamics specifically on the dynamics of N and C termini of the intein shown by NMR chemical shift perturbations and H/D exchange (45). Directed evolution studies also show that distal mutations have enhancing effects on splicing (40,44,46). In the Ssp DnaB mini-intein, the mutations acquired through directed evolution have additive effects, and the mutant intein becomes more tolerant to the local extein sequences and the constraints these sequences impose on the intein active site (44). Similarly, the Npu DnaE intein, selected against noncanonical C-extein (SGV instead of the native CFN), accumulate mutations distal from the active site (40). These mutations were identified as either a part of or juxtaposed to a spatially contiguous network of amino acids predicted to be energetically coupled to the active site. A directed evolution system, using phage display to generate novel inteins with enhanced activity over a broad pH and temperature range, demonstrated that most mutations that accommodated the stress conditions mapped on the surface of the intein. This again emphasizes that peripheral mutations can cause subtle conformational changes in the active-site environment (47). The mutations proximal to the active site typically orient and/or stabilize active-site residues. The distal mutations appear to either stabilize the intein fold and/or relay perturbations through a contiguous network of amino acids that can alter the active-site structure and dynamics.

Conclusions
Thus far, traditional structural biology and biochemistry approaches have elucidated many details of the structural basis for protein splicing. However, a detailed understanding of how the intein fold coordinates the chemistry of its multiple active sites is still lacking, as is a better understanding of the general rules for extein preferences. The fact that intein activity can be modulated by mutations distant from the active site suggests that allosteric networks may play a larger role in determining intein activity than previously thought. Ultimately, a better understanding of the structure and dynamics of inteins is crucial as using this information will be essential to engineer inteins as better biotechnology tools. Currently, some of reactive intermediates in the splicing pathway are not readily experimentally observable by using traditional experimental techniques. However, combinations of synthetic chemistry, directed evolution, and computational modeling (for example, see Ref. 48) are likely to provide insights into these areas, resulting in a better understanding of protein splicing and eventually new, highly efficient inteins for protein engineering.