Protein splicing and auto-cleavage of bacterial intein-like domains lacking a C'-flanking nucleophilic residue.

Bacterial intein-like (BIL) domains are newly identified homologs of intein protein-splicing domains. The two known types of BIL domains together with inteins and hedgehog (Hog) auto-processing domains form the Hog/intein (HINT) superfamily. BIL domains are distinct from inteins and Hogs in sequence, phylogenetic distribution, and host protein type, but little is known about their biochemical activity. Here we experimentally study the auto-processing activity of four BIL domains. An A-type BIL domain from Clostridium thermocellum showed both protein-splicing and auto-cleavage activities. The splicing is notable, because this domain has a native Ala C'-flanking residue rather than a nucleophilic residue, which is absolutely necessary for intein protein splicing. B-type BIL domains from Rhodobacter sphaeroides and Rhodobacter capsulatus cleaved their N' or C' ends. We propose an alternative protein-splicing mechanism for the A-type BIL domains. After an initial N-S acyl shift, creating a thioester bond at the N' end of the domain, the C' end of the domain is cleaved by Asn cyclization. The resulting amino end of the C'-flank attacks the thioester bond next at the N' end of the domain. This aminolysis step splices the two flanks of the domain. The B-type BIL domain cleavage activity is explained in the context of the canonical intein protein-splicing mechanism. Our results suggest that the different HINT domains have related biochemical activities of proteolytic cleavages, ligation and splicing. Yet the predominant reactions diverged in each HINT type according to their specific biological roles. We suggest that the BIL domain cleavage and splicing reactions are mechanisms for post-translationally generating protein variability, particularly in extracellular bacterial proteins.

mains rearrange their NЈ-peptide bond into a thioester bond. This thioester is cleaved by a nucleophilic attack of a cholesterol molecule bound by a downstream domain (3,4). A similar nucleophilic attack occurs during the protein splicing of inteins out of their protein hosts. The rearranged ester/thioester bond at the intein NЈ end is attacked by the nucleophilic side chain of the intein CЈ-flanking residue followed by additional splicing reactions (5). Intein protein splicing thus depends on an invariable Cys, Ser, or Thr nucleophilic CЈ-flanking residue (ϩ1) for the trans-esterification and acyl rearrangement steps (2,6).
BIL domains are distinct from inteins and Hogs in sequence, phylogenetic distribution, and host protein type (1). Each of the two BIL types has characteristic and unique sequence features that cluster them separately from other HINT types. Although inteins are integrated in highly conserved sites of essential proteins and Hogs are present in hedgehog and related nematode proteins, BIL domains are integrated in variable regions of non-conserved diverse bacterial proteins, some of which have extracellular motifs. This leads to the hypothesis that BIL domains may have biological roles different from those of other HINT domains (1). Yet little is known regarding the biochemical activity of each BIL type.
We previously described (1) the catalytic activity of an A-type and a B-type BIL domain. The A-type BIL domain was shown to have protein-splicing and CЈ-cleavage activities. However, this domain was naturally flanked by a Thr ϩ1 residue, which is typical of inteins but not of A-type BIL domains. Only 15% of known A-type BIL domains is followed by Ser or Thr, and none is followed by Cys residues. An A-type BIL domain with ϩ1 Tyr residue was shown recently by Southworth et al. (7) to have NЈ-terminal cleavage but no protein-splicing activity. The Btype BIL domain was examined previously by us only in a cell-free system. It was shown to be active with preliminary evidence for cleavage and protein splicing. Peptide splicing outside the context of intein-like domains also was shown recently to occur in the proteasome, generating variant peptides to be displayed on major histocompatibility complex class I proteins (8).
Here we examine in detail the auto-cleavage and splicing activity of four BIL domains: one A-type BIL domain with a native non-nucleophilic CЈ-flanking residue (Ala ϩ1) and three different B-type BIL domains. We also show that BIL domains are present in more major groups of bacteria and in proteins likely to be secreted. The probable functions and chemical reaction mechanisms of BIL domains and their relation to inteins are discussed.

EXPERIMENTAL PROCEDURES
Bacterial Strains and DNA Primers-Rhodobacter sphaeroides 2.4.1 (Rsp) genome was a kind gift from Dr. Steven L. Porter (University of Oxford). Rhodobacter capsulatus (Rca) MD1 genome was a kind gift from Dr. Fevzi Daldal (University of Pennsylvania), and Clostridium thermocellum (Cth) genome was a kind gift from Dr. Ying Tsai (University of Rochester). The following BIL domains were cloned: BIL4-Cth * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
(NCBI gi code 23020817); BIL1-Rsp (NCBI gi code 22959584); BIL2-Rsp (NCBI gi code 22959191); and 1522-Rca (1). The BIL domains were amplified by PCR using the primers in Table I and cloned between two protein tags in a plasmid termed pC2C (as described by Amitai et al. (1)). This plasmid is a modification of the pMALC2 vector (New England BioLabs, Beverly, MA) containing the malE gene for maltose-binding protein (M) from Escherichia coli and a downstream cbd gene coding for chitin-binding domain (C) from Bacillus circulans).
Functional Assay of Protein-splicing and Cleavage Activity-The coding sequence of different BIL domains (B) was cloned in-frame between two protein tags, the maltose-binding protein (M) upstream and the chitin-binding domain (C) downstream. The chimeric protein, M-B-C, was overexpressed and extracted in E. coli bacteria as described previously (1). Protein extraction buffer contained 20 mM Tris, pH 7.4, 200 mM NaCl, 1 mM EDTA, and 1 mM sodium azide.
Purification of Tagged Protein Products-Soluble protein products containing either a C or a M tag were purified on affinity columns using chitin (New England Biolabs) or amylose (New England Biolabs) beads, respectively. Lysed cell supernatant in extraction buffer was applied to beads for 1 h at 4°C with shaking. Elution of proteins from chitin beads was done by mixing the beads with SDS-PAGE sample loading buffer and boiling for 2-3 min. Extraction buffer with 10 mM maltose was used to elute proteins from amylose beads.
Heat Purification of BIL4-Cth Domain-The supernatant of E. coli cell lysate overexpressing the BIL4-Cth construct was heated in extraction buffer to 37-80°C for 20 min. Soluble proteins were separated from the denatured ones by centrifugation at 13,000 rpm for 3 min and applied on an SDS gel.
In Vitro Protein Transcription/Translation-In vitro transcription/ translation was carried out using E. coli S30 extract for circular DNA system (Promega, Madison WI) as described by Amitai et al. (1).
Western Blot Analysis-Western blot analysis was used to identify protein products containing either the M or C tag and to identify the GroEL and DnaK protein chaperones. To identify the M tag, monoclonal mouse antibodies directed at maltose-binding protein (Novus Biologicals, Littleton, CO) were used in a 1:800 ratio. To identify the C tag, polyclonal rabbit antibodies directed at CBD (New England Biolabs) were used in a ratio of 1:5000. Antibodies for GroEL were a kind gift from Prof. Amnon Horovitz (rabbit antibodies, used in a ratio of 1:1000), and DnaK mouse antibodies (Stressgen) were used in a ratio of 1:1000. The secondary antibodies used were horseradish peroxidase-conjugated goat anti-mouse IgG or goat anti-rabbit IgG (Jackson ImmunoResearch Laboratories, West Grove, PA) in a ratio of 1:10000. Chemiluminescence detection was held using SuperSignal (Pierce) according to the manufacturer's protocol.
Mass Spectrometry (MS) Methods-Intact molecular weight measurements and peptide mass mapping by matrix-assisted laser desorption/ionization (MALDI) MS were performed at the Weizmann Institute Biological Mass Spectrometry unit and at the Smolar Center for proteins (Technion, Israel). Electroelution from gel followed by in-gel digestion with trypsin, chymotrypsin, or V8 proteases was performed and analyzed as described previously (37).
N-terminal Amino Acid Sequencing-Proteins were electrophoresed by SDS-PAGE, and selected bands were prepared as described by Amitai et al. (9) and subjected to Edman degradation at the Weizmann Institute Biological Mass Spectrometry Unit.
Computational Sequence Analysis-Sequence searches used the BLAST programs (10) and the BLIMPS program for block-to-sequence searches (11). Block multiple sequence alignments and phylogenetic analysis were conducted as described by Amitai et al. (1). Protein motifs were detected using the InterProScan tool (www.ebi.ac.uk/ interpro/scan.html).

RESULTS
To characterize the proteolytic activity of new A-and B-type BIL domains, each BIL domain (B) was cloned in-frame between two protein tags, maltose-binding protein (M) upstream and chitin-binding domain (C) downstream. Protein products of each chimeric gene (M-B-C) were examined in vivo and in vitro by various methods. To characterize the BIL domain activity in its native protein context, some of the domains were cloned with their full or partial native flanks, whereas others were cloned only with single residue flanks.
Protein Splicing and Cleavage of an A-type BIL Domain with Ala ϩ1 Residue-BIL4-Cth is one of the 23 A-type BIL domains we identified in the thermophilic bacterium Cth (1). It is typical of most A-type BIL domains to have all of the intein proteinsplicing active site residues with the exception of the CЈ-flanking nucleophile (supplemental Fig. S3). Instead of Cys, Ser, or Thr invariably present in inteins, BIL4-Cth is followed by an Ala ϩ1 residue. This is the residue present in 18% A-type BIL domains (fraction calculated as weighted average of putative active domains).
The BIL4-Cth M-B-C precursor was overexpressed in vivo as a double-tagged protein, and its products were detected and analyzed. Putative protein-splicing products, the excised BIL domain and the ligated M-C flanks, and the M-B-and Mcleavage products were detected. These products were identified by Western blotting of total cell lysates and affinity-purified proteins separated on SDS-PAGE (Fig. 1A). Relative quantities of products were calculated according to measurements taken from Coomassie Blue-stained SDS gels of amylose-purified proteins and total lysates (supplemental Fig. S1). Only trace amounts of the M-B-C precursor were detected under all of the separation procedures, indicating an efficient processing. Spliced product M-C comprised 20 -25% of the final products, whereas CЈ-cleavage product M-B comprised ϳ5% of the final products. M and B proteins comprised most of the final products, indicating that they were generated by a combination of NЈ-and CЈ-cleavages. The final amount of B protein was much larger than the amount of the M-C-splicing product. This finding implies that both protein splicing and cleavage at its NЈ and CЈ ends released the B protein. The C product was not identified in the gels, perhaps because of cellular degradation.
To characterize the putative splicing product using MALDI a Number of residues flanking the BIL domain.
MS, the M-C band was extracted from the gel and digested with proteolytic enzymes. The presence of the M and C domains was verified using MS/MS analyses. Furthermore, two peptide masses corresponding to splicing-junction peptides were detected from the chymotrypsin digestion of the M-C band. One mass corresponded to a fully cleaved (NЈ-GSASRVDCG-GLTGL-CЈ) peptide, and another mass corresponded to its miscleaved form (NЈ-GSASRVDCGGLTGLNSGLTTNPGVSAW-CЈ) with high mass accuracies (Table II). The ligated splicing junction is between the second and third residues (Ser-Ala) with the Ser being coded by a linker joining the M tag to the BIL domain and the Ala being the native residue downstream of the BIL domain (Ala ϩ1). Spliced BIL domain was purified, and its identity was verified by MS. We were able to purify the BIL domain by heat treatment, probably because it originated from a thermophilic bacterium. Incubation of total cell lysate at 80°C left only the putative BIL domain in the soluble fraction (Fig. 1B). Intact mass MS analysis of this 15-kDa band identified the expected mass of the BIL domain, and its sequence was verified by MS/MS analysis (see Table IV and data not shown). The exact CЈ end of the BIL domain was identified by MS analysis as Asn as expected (Table III).
A putative CЈ-cleavage product, M-B, was affinity-purified and identified by anti-M antibodies (Fig. 1A). Its intact mass analysis corresponded to the expected mass of a CЈ-cleavage product (Table IV). Other masses obtained from this sample corresponded to the M tag and to other smaller masses that could result from a cross-contamination of the M-B band by traces of smaller proteins on the gel.
A protein band corresponding to the M tag was identified by Coomassie Blue staining and by Western blotting using anti-M antibodies ( Fig. 1A and supplemental Fig. S1). This putative NЈ-cleavage product was observed in total lysates and in elutions of both chitin and amylose affinity columns.
To examine whether the Tris cell extraction and protein purification buffer promoted cleavage and splicing of the M-B-C precursor, the extraction and purification procedures were repeated using different buffers (Bis-Tris propane, HEPES, sodium phosphate, and borate). Same products and relative amounts were observed with all of these control buffers (data not shown).
In Vivo and in Vitro Cleavage Activities of B-type BIL Domains-B-type BIL domains are more heterogeneous in sequence than A-type domains (1). To characterize their activity, we cloned three different B-type BIL domains into the doubletagged system (described above): the two BIL domains present in R. sphaeroides termed BIL1-Rsp and BIL2-Rsp and one of the 14 BIL domains present in R. capsulatus termed 1522-Rca. The conserved CЈ sequence motif of B-type BIL domains is distinct from those motifs in other known HINT domains (1). The CЈ end of the cloned BIL1-Rsp and 1522-Rca is typical of B-type BIL domains, whereas BIL2-Rsp has an atypical CЈ end (supplemental Fig. S3).
NЈ-cleavage of B-type BIL1-Rsp-BIL1-Rsp, a B-type BIL domain from R. sphaeroides, was cloned between M and C tags with its native NЈ-14 residue and CЈ-51 residue flanks and overexpressed in E. coli cells. M-B-C precursor M and B-C NЈ-cleavage products were identified by Coomassie Blue staining and Western blotting of total lysate and affinity-purified protein samples (Fig. 2). To verify the nature of the NЈ-cleavage product, B-C, the band was micro-sequenced. The resulting sequence (XFTPGT) corresponded to the predicted NЈ end of the BIL domain, which also includes Cys-1, which usually cannot be detected by this method (supplemental Table S-I).
An additional 58-kDa band was co-purified with the M-B-C precursor. Its analysis suggests that the band might include more than a single protein species. Both anti-M and anti-C antibodies reacted with this band. However, the peptide mapping of the band identified peptides from both the M tag and the E. coli GroEL chaperone protein. Additionally, no peptides from the B and C domains were identified (data not shown). Intact mass of the band identified a mass of 58.317 kDa corresponding to GroEL and an additional unidentified protein mass of 65.175 kDa (Table IV). As a control, we checked a crossreaction of anti-C antibodies with purified GroEL protein (supplemental Fig. S2B). Anti-C antibodies showed reactivity toward GroEL, probably because of their polyclonal nature.
GroEL chaperone was detected in protein samples purified the following affinity columns: on amylose; chitin; and amylose followed by chitin. This indicates a tight and specific binding of GroEL with the precursor and/or protein products. The association of GroEL with unfolded proteins is reversible to some extent upon incubation with ATP-Mg-K (12). Such incubation of washed protein samples bound on chitin reduced but did not eliminate the amount of GroEL eluted from chitin (supplemental Fig. S2B).
CЈ-cleavage of B-type BIL2-Rsp in Vivo, in Vitro, and in Cell-free Systems-BIL2-Rsp was cloned between M and C tags with one native flanking residue at either end (NЈ-Leu and CЈ-Pro) and overexpressed in E. coli and in a cell-free system. In both systems, the main product was the M-B-C precursor with small amounts of M-B-and M-cleavage products (Fig. 3A). An additional band of ϳ70 kDa appeared above the precursor band when expressed in vivo. This band was identified as the analysis gave a measured mass of 56.071 kDa, slightly smaller than the expected mass of the putative M-B product.
To examine the in vitro activity of BIL2-Rsp, the overexpressed M-B-C precursor was isolated by sequential affinity columns (amylose followed by chitin) and was incubated in the extraction buffer in different temperatures for different time periods. Increasing amounts of the M-B product were clearly detected within 1 day at 4°C (Fig. 3B). The presence of the M band may be attributed to the NЈ-cleavage of the BIL domain; however, the complementary B-C band was not detected. Alternatively, this could have resulted from protein degradation.
Similar results were observed when BIL2-Rsp domain was cloned with its full native flanks (data not shown). However, this clone also underwent cleavage in an Arg-Arg dipeptide present in the NЈ-flank of the BIL domain as verified by Nterminal sequencing. This cleavage was also observed when the flanks were cloned without the BIL domain (data not shown). Thus, we suggest that this activity is unrelated to the BIL domain and is probably due to an E. coli protease (perhaps OmpT) that can cleave the BIL domain flank.
No Activity of B-type Rca-1522 BIL-1522-Rca B-type BIL domain is natively present in a very large R. capsulatus protein. The domain is preceded by 1821 residues and is followed by 52 residues. The upstream flank of this BIL domain includes RTX (repeats-in toxin) calcium-binding repeat motifs, characteristic of secreted proteins (13). The BIL domain was cloned with 36 NЈ-flanking residues and 35 CЈ-flanking residues in the double tag expression vector. Overexpression of the vector yielded only the M-B-C precursor and E. coli GroEL protein as verified by Coomassie Blue staining, Western blotting, N-terminal sequencing, and MS analysis (Table IV and   Species and Protein Host Distribution of BIL Domains-BIL domains were identified originally in species from Gram-negative ␣, ␤, and ␥ Proteobacteria and from Gram-positive Actinobacteria and the Bacillus/Clostridium group (1). Further data base searches now broaden the taxonomic range of BIL domains to major bacterial divisions and lineages (supplemental Table S-II). A-type BIL domains were found in ␦ Proteobacteria, Cyanobacteria, Spirochaetes, Planctomycetes, and Verrucomicrobia. B-type BIL domains were found in ␣ Proteobacteria, Rhizobium, and Silicibacter species.
Sequence analyses of over a hundred identified BIL flanks reconfirmed our previous observation of the nature of the BIL domain hosts. BIL domains are present in homologs of known and predicted secreted proteins. This is exemplified by Streptomyces avermitilis, Verrucomicrobium, and Gloeobacter A-type BIL domains that are found downstream of long (400 -5400 residues) Rhs core elements. Rhs elements are composite genetic elements, and their cores are believed to be cell-surface ligandbinding proteins (14). The BIL domains are present in the hypervariable core extension region that can be shuffled between the core and downstream open-reading frame regions. DISCUSSION In this study, we show that a typical A-type BIL domain is capable of protein splicing without a CЈ-nucleophilic ϩ1 residue and that B-type BIL domains can cleave their NЈ or CЈ ends. Both types of domains are not uncommon, appearing in diverse bacterial divisions. These findings reflect the auto-processing nature of intein-like domains. We explain the NЈ-and CЈcleavage of B-type BIL domains by reactions occurring in the canonical intein protein-splicing mechanism and propose an alternative pathway for A-type BIL domains splicing. Our results suggest that the biochemical activities of the BIL domains are distinct from inteins, and their native biological function is probably protein modification by splicing and cleavage activity.
Protein-splicing Mechanism without a Nucleophilic ϩ1 Residue-Intein protein-splicing mechanism was largely determined by mutational analysis of a few representative intein domains (2,6,(15)(16)(17)(18). This allowed the delineation of the biochemical reactions of protein splicing and supported splicing as the native activity of inteins. Other evidence for the nature of interin activity are the high efficiency of intein proteinsplicing, intein distribution in species and host proteins, and the function of intein genes as selfish genetic elements (19).
Currently, the accepted mechanisms for intein protein-splicing require a Cys, Ser, or Thr ϩ1 residue at the intein immediate CЈ-flank. This nucleophilic ϩ1 residue is crucial for the trans-esterification step and for the final acyl rearrangement (Fig. 4, steps 2A and 4A). In inteins with NЈ-Ala-1, the nucleophilic ϩ1 residue directly attacks the peptide bond at the intein NЈ end (16). Mutating the intein active site residues, including the ϩ1 nucleophilic residue, abolishes splicing or leads to cleavage of the intein CЈ, NЈ end, or both (15,20).
In our study, the major products of BIL4-Cth expression were NЈ-and CЈ-cleavages, whereas protein splicing was ap-   proximately a quarter to a fifth of the A-type BIL domain activity with almost complete processing of the precursor. Most probably, the initial cleavage activity was at the CЈ end, producing the M-B and C products, followed by additional NЈcleavage of the M-B product, producing the M and B products. This is supported by the relative amounts of the final products and the absence of the B-C product.
Our results show protein splicing of an A-type BIL with conserved sequence features closely related to inteins including all of the active site residues apart from the ϩ1 residue. Hence, FIG. 2. N-cleavage activity of BIL1-Rsp. Protein products from E. coli overexpression of M-B-C construct with B-type BIL1-Rsp were eluted from amylose (A), chitin (C), or both (AϩC) affinity columns or analyzed in total cell lysate (T). Proteins were separated on SDS-PAGE and either stained with Coomassie Blue or detected by anti-M, anti-C, or anti-GroEL (Anti-G) antibodies. See "Results" for discussion of GroEL cross-detection by anti-C antibodies. we propose a modified protein-splicing mechanism for A-type BIL domains. The mechanism is similar to the canonical protein-splicing mechanism of inteins, only differing in the nature of the nucleophilic attack on the thioester bond in the NЈ end at the BIL domain.
Our suggestion includes the following steps of protein splicing in A-type BIL domains (Fig. 4). (i) A thioester is formed at the NЈ end of the domain by the N-S acyl shift (Fig. 4, step 1) by attack of the thiol group of the conserved Cys-1 residue on the carbonyl group of the peptide bond N-terminal to Cys-1. This reaction is the same as the first step of canonical intein protein splicing (15,18,20). (ii) Concomitantly, the conserved Asn residue at the CЈ of the domain undergoes cyclization into an aminosuccinimide ring, cleaving the peptide bond at the domain CЈ end (Fig. 4, step 2B). This step generates two intermediate products: the NЈ-flank covalently connected to the BIL domain by a thioester bond and the detached CЈ-flank. This reaction also occurs in intein protein splicing but only after ligation of the two intein flanks (Fig. 4, step 3A) (5,21). In inteins, premature Asn cyclization results in CЈ-cleavage and no splicing (22). Although the timing of Asn cyclization is tightly controlled in inteins, it can still occur when other steps of the splicing are blocked by mutations at the NЈ-and/or CЈ-splice junction (17,(23)(24)(25). (iii) The free N terminus of the CЈ-flank performs an aminolysis reaction of the labile thioester bond next at the NЈ junction of the domain formed in step i. This reaction ligates the two BIL domain flanks with a peptide bond and releases the BIL domain from its NЈ-flank. This step probably occurs immediately after step ii to prevent the dissociation of the CЈ-flank from the NЈ-flank and BIL domain. (iv) Finally, the BIL domain CЈ-aminosuccinimide ring hydrolyzes into Asn or iso-Asn, similarly to inteins (Fig. 4, step 4) (26).
Aminolysis reaction, involving an attack of the CЈ-amine on a NЈ-ester, was proposed previously to occur in intein protein splicing (27,28). A detailed analysis of representative inteins established the canonical protein-splicing mechanism and ruled out aminolysis as part of the process (15,20). Considering our experimental results and the various residues in the ϩ1 position of A-type BIL domains, we suggest that these domains protein splice with an aminolysis reaction.
Recently, aminolysis was proposed as part of a peptide-splicing activity of the proteasome that generates the displayed variant antigenic peptides (8). The cleaved peptides within the proteasome are attached transiently from the CЈ end to Thr residues by ester bonds (21). Vigneron et al. (8) suggest that the NЈ end of another cleaved peptide from the same protein attacks this bond in an aminolysis reaction, ligating the two peptides. Aminolysis also occurs in other biological reactions, including the attachment of myristate to the NЈ end of proteins by N-myristoyltransferase (29).
Why are inteins integrated upstream to Cys, Ser, or Thr residues when, as we show here, protein-splicing can proceed with other residues in this position? Being able to successfully integrate in a wider range of sites seems highly advantageous for selfish genetic elements such as inteins (19,30). We believe the answer to this question is related to the differences between the mechanisms for protein splicing in inteins and in A-type BIL domains. The intein domain and its flanks remain covalently attached until ligation of the flanks and release of the intein (Fig. 4). In our proposed mechanism for A-type BIL domains, the CЈ-flank is detached from the BIL and its NЈ-flank before its ligation to the NЈ-flank. This may lead to a higher frequency of NЈ-and CЈ-cleavage side products. Such partial splicing in inteins will reduce the amount of mature (spliced) host proteins, which are typically conserved, and crucial proteins, and might negatively affect cell survival. Perhaps even more harmful is the possible dominant-negative effect of the cleaved byproducts of intein hosts. In contrast, partial splicing of BIL domains (i.e. NЈ-and/or CЈ-cleavage) may serve for increasing the protein host variability (1).
Our results, together with previous reports of other atypical intein protein-splicing mechanism (9), show that this activity can proceed by several alternative and partially overlapping biochemical reactions. Thus, the canonical intein protein-splicing mechanism may need to be expanded, or its scope may need to be limited. Aminolysis and perhaps other atypical mechanisms may be the way some inteins and other HINT domains protein-splice.
Cleavage Mechanisms of B-type BIL Domains-The B-type BIL domains were found by us to auto-catalytically cleave their NЈ or CЈ ends. This activity is analogous to inteins proteinsplicing side reactions and is common in N-terminal rearrangements of auto-processing proteins (2). Both intein and BIL domains have conserved Cys or Ser in position 1 whose thiol or hydroxyl groups are essential for the acyl rearrangement at the N terminus. Thus, the NЈ-peptide-bond of BIL1-Rsp could be converted into a thioester through the N-S acyl shift, similarly to inteins (Fig. 4, step 1). In inteins, this reaction is followed by trans-esterification of the thioester by the side chain of the ϩ1 residue, forming a branch intermediate and leading to splicing product formation. Such products were not obtained in the BIL1-Rsp precursor expression, suggesting that the labile thioester was hydrolyzed by water or by an external nucleophile. We do not exclude the possibility that this cleavage was coupled to ligation of the upstream flank with an external nucleophile, similar to the attachment of cholesterol to Hedge domain upstream to the Hog HINT domain. Such a ligation would modify the M tag and assign it with a higher mass. One of the BIL1-Rsp yet uncharacterized products may correspond to this putative product.
A previously proposed mechanism for CЈ-cleavage of the Chy R1 intein mutant (9) and for Pab PolII intein (31) can explain the CЈ-auto-cleavage of BIL2-Rsp. According to this finding, an attack of the BIL domain Ser-1 hydroxyl group on a peptide bond carbonyl at the CЈ region of the domain would form an ester bond through the N-O acyl shift, which in turn can be hydrolyzed, detaching the BIL domain from its CЈ-flank (9). This proposed mechanism is independent of a CЈ-nucleophilic residue. Assuming that BIL domains have the HINT fold, their NЈ end is in a position to cleave their CЈ region.
Our heterologous conditions of protein expression may alter the native activity of BIL domains. Overexpression in E. coli cells and changes in the domain context (BIL domain flanks), as well as in vitro conditions such as redox environment or temperature, may alter the protein in vivo fold and function. Nevertheless, in light of extensive experiments in other proteins and HINT domains, we assume that the BIL domain activity we observed is related to their native one. Improper folding of flanked B-type BIL domains may have triggered the overexpression of chaperones (DnaK, GroEL) (12). We propose that the chaperones, which were co-purified with B-type BIL but were absent in A-type BIL or the control vector, are not merely byproducts of the heterologous expression system. Chaperones may be involved in BIL domains proper folding, extracellular targeting, or biological activity. Attachment of chaperones to the BIL precursor may also spatially block its splicing activity.
Biological Roles of Different Types of HINT Domains-The HINT superfamily currently includes four separate families: inteins; Hogs; A-type BIL; and B-type BIL domains. All of the families are homologous and share sequence, structure, and biochemical properties (2, 4, 6, 32). Yet each family is distinct in specific sequence features, protein host context, and biological roles. Members of each family can be diverse in sequence and are still found occasionally in new protein and phylogenetic contexts. It is likely that other HINT families will be discovered and characterized. Thus, identifying the family of a HINT domain can be an additional challenge to recognizing the domain as a HINT type.
Sequence motifs and structure folds characterizing the HINT superfamily and those specific to inteins, Hogs, and BIL domains have been described previously (1,33,34). Most inteins also include a central homing-endonuclease domain (35) not found in the other known HINT families. Inteins are also integrated in conserved positions of essential proteins. Both these features are a consequence of the selfish element nature of intein genes (19,30). Hog domains are located upstream to the cholesterol-binding domain and downstream to the Hedge domains and to the Wart and Ground domains of nematodes (36). The role of Hog domains in hedgehog proteins and perhaps also in the nematode proteins is post-translational modification in the maturation process of their host protein.
Less information is available for the two known BIL domains. Nevertheless, the experimental and computational results we show in this work support our initial hypotheses. Most BIL domains are present in variable positions of non-conserved proteins. Many BIL host proteins also include motifs, repeats, and domains that characterize extracellular protein regions. We show here and in the first report of the BIL domains (1) that the biochemical activity of BIL domains includes protein splicing and auto-cleavage of their hosts. We suggest that the biological role of BIL domains is to increase the variability of their hosts, mainly in extracellular protein regions, by cis-and transligation of proteins and other moieties to the hosts.