An intron-encoded protein assists RNA splicing of multiple similar introns of different bacterial genes.

Four group II introns were found in an unusually intron-rich dnaN gene (encoding the beta subunit of DNA polymerase III) of the cyanobacterium Trichodesmium erythraeum, and they have strong similarities to two introns of the RIR gene (encoding ribonucleotide reductase) of the same organism. Of these six introns, only the RIR-3 intron encodes a maturase protein and showed efficient RNA splicing when expressed in Escherichia coli cells. The other five introns do not encode a maturase protein and did not show RNA splicing in E. coli. But these maturase-less introns showed efficient RNA splicing when the RIR-3 intron-encoded maturase protein was co-expressed from a freestanding gene in the same cell. These findings demonstrated that an intron-encoded protein could function as a general maturase for multiple introns of different genes. Major implications may include an intron-mediated co-regulation of the different genes and a resemblance of the evolutionary origin of spliceosomal introns.

Four group II introns were found in an unusually intron-rich dnaN gene (encoding the ␤ subunit of DNA polymerase III) of the cyanobacterium Trichodesmium erythraeum, and they have strong similarities to two introns of the RIR gene (encoding ribonucleotide reductase) of the same organism. Of these six introns, only the RIR-3 intron encodes a maturase protein and showed efficient RNA splicing when expressed in Escherichia coli cells. The other five introns do not encode a maturase protein and did not show RNA splicing in E. coli. But these maturase-less introns showed efficient RNA splicing when the RIR-3 intron-encoded maturase protein was co-expressed from a freestanding gene in the same cell. These findings demonstrated that an intron-encoded protein could function as a general maturase for multiple introns of different genes. Major implications may include an intron-mediated co-regulation of the different genes and a resemblance of the evolutionary origin of spliceosomal introns.
Group II introns are mobile genetic elements and presumed evolutionary progenitors of nuclear spliceosomal introns (1,2). RNA sequences of group II introns can be folded into a conserved six-domain structure with properties of a catalytic RNA (3,4). Group II introns are found more abundantly in mitochondria and chloroplasts (5,6), where they reside in conserved genes. In contrast, most bacterial group II introns were found in mobile DNA or outside of genes (5,7), which may suggest barriers against intron insertion in conserved bacterial genes. To do RNA splicing in vivo, group II introns require proteins or maturase to help the intron RNA fold into a structure that is catalytically active (1,3), although some group II introns can self-splice in vitro under non-physiological conditions. Most bacterial group II introns and some organelle group II introns encode a protein typically in the domain IV loop region of the folded intron structure. The intron-encoded protein functions as a maturase to assist the RNA splicing, in addition to assisting intron retrohoming or retrotransposition (8,9).
Previously characterized intron-encoded maturase proteins assist specifically the intron that encodes the maturase, although it is possible that closely related maturases may cross-react. In mitochondria and chloroplasts, many group II introns do not encode a maturase protein, and at least some of the maturase-less introns have recruited nuclear proteins to assist the RNA splicing (1). In chloroplasts, indirect evidences have suggested that maturase-less group II introns may be assisted by a maturase protein encoded in another intron (10,11), but this has not been demonstrated. In bacteria, maturase-less group II introns are rare and found recently in an Archaea and a thermophilic cyanobacterium (12). Most bacterial introns are located outside genes or conserved genes, and it is not known whether or how the maturase-less introns do RNA splicing.
Here we report an unusual case of four maturase-less group II introns in a conserved bacterial protein gene, which is the largest number of introns known in a bacterial gene. More interestingly, these maturase-less introns could do RNA splicing in Escherichia coli when assisted by a maturase protein encoded in another intron of a different gene. This demonstrated for the first time that an intron-encoded protein could function as a general maturase for multiple introns of different genes, which has interesting implications on intron evolution and on possible roles of intron in coordinating gene expression.

EXPERIMENTAL PROCEDURES
Gene Cloning and Sequence Analysis-The dnaN gene, or parts of it, was amplified from genomic DNA of Trichodesmium erythraeum strain IMS101, by doing PCR using the high fidelity DNA polymerase Phusion (Finnzymes). The resulting DNAs were cloned in a plasmid vector pDrive (Qiagen) at the T/A cloning site, and DNA sequences were determined through automated DNA sequencing. GenBank TM searches, sequence alignments, and intron RNA folding were performed using the BLAST search program (13), the Clustal W program (14), and the Mfold program (15), respectively.
RNA Splicing Analysis-Intron-containing DNAs were PCR-amplified and inserted in a ColEI-derived pDrive plasmid vector behind a P tac promoter. This plasmid has an ampicillin resistance gene, and the P tac promoter is inducible with isopropyl 1-thio-␤-D-galactopyranoside (IPTG). 2 The coding sequence of the maturase protein was PCR-amplified and inserted in a modified pAR plasmid vector behind a P BAD promoter. This plasmid was derived from a pACYC184 plasmid that is compatible with ColEI-derived plasmids, it has a chloramphenical resistance gene, and the P BAD promoter is inducible with L-arabinose. Site-specific mutations (single nucleotide insertion in the maturase coding sequence and deletion of the domain IV loop region from some introns) were produced through inverse PCR using specific primers at the sites of mutation. DNA sequences of cloned DNAs were confirmed through automated DNA sequencing. Individual recombinant plasmid was introduced into E. coli cells through electroporation. For co-expression experiments, two plasmids were introduced sequentially into the same E. coli cells, and transformed cells were selected and maintained in the presence of both ampicillin and chloramphenical.
For RNA splicing assays, E. coli cells containing the specified plasmid were grown in liquid Luria Broth medium at 37°C to mid-log phase (A 600 , 0.3). IPTG was then added to a final concentration of 0.8 mM to induce transcription of the intron-containing gene, and the induction was continued for 2 h. For cells containing also the maturase-encoding plasmid, L-arabinose was added to a final concentration of 0.2% to induce production of the maturase protein 30 min before adding IPTG. Total RNAs of the induced cells were extracted and treated with RNase-free DNase using the RNAease kit (Qiagen). Reverse transcription and subsequent PCR were carried out using specified primers and the reverse transcription (RT)-PCR kit (Qiagen). The resulting DNA products were identified by their predicted sizes and confirmed through cloning and DNA sequencing.

Identification and Sequence Analysis of dnaN Gene and Its Introns-
The recent findings of group II introns in an RIR gene (16) prompted us to find similar introns in related genes in the nearly complete genome sequence of the oceanic N 2 -fixing cyanobacterium T. erythraeum strain IMS101 (www.jgi.doe. gov), which had been determined at the United States Department of Energy Joint Genome Institute. This search revealed an intron-rich dnaN gene as illustrated in Fig. 1, whose 6,510-bp sequence consists of five exons and four group II introns. The five exons are 30, 55, 158, 303, and 615 bp long, respectively, and together they predict a 387-amino acid protein sequence that is highly similar to DnaN sequences of related organisms. As shown in Fig. 1, the predicted DnaN protein sequence is 54 -57% identical and 70% similar to intron-less DnaN sequences of other cyanobacteria. It is also 23% identical and 42% similar to the functionally characterized E. coli DnaN protein sequence (data not shown).
The four introns in the dnaN gene, referred to as dnaN-1, -2, -3, and -4 introns, are 1408, 1128, 1835, and 978 nt long, respectively. Boundaries of these introns were easily predicted through sequence comparisons with intron-less dnaN genes of closely related organisms (Fig. 1). As shown in Fig. 2, the four introns are similar to one another and also to two other introns previously * This work was supported by a research grant from the National Science and Engineering Research Council of Canada. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1 To whom correspondence should be addressed. Tel.: 902-494-1208; Fax: 902-494-1355; E-mail: pxqliu@dal.ca.  identified in the same organism. The previously identified introns are the T.er.I3 and T.er.I4 introns in the RIR gene encoding a ribonucleotide reductase (16), and they are referred to as RIR-2 and RIR-3 introns, respectively, to be consistent with the naming of the dnaN-1, -2, -3, and -4 introns and to allow easier identification of the host gene of each intron. These six introns of the dnaN and RIR genes are 75-85% identical in a ϳ680-nt catalytic core region, which can be folded into a conserved six-domain structure typical of group II introns. The folded structures are not shown, because they are very similar to published structures of previously identified introns of this organism (17).
Outside the folded catalytic core, the six introns differ greatly in size in the domain IV loop region (Fig. 2), where a maturase-coding open reading frame is usually located. Among the six introns, only the RIR-3 intron has a complete maturase-coding sequence in domain IV of its folded structure. None of the others has a maturase-coding open reading frame, although various amounts of apparent remnants of maturase-coding sequences could be recognized. Exon-binding sequences (EBS1 and EBS2) and intron-binding sequences (IBS1 and IBS2) could also be predicted for each of the six introns (Fig. 2), which are known to facilitate RNA splicing by forming base-pairings between the intron and the exon (18). As expected, EBS1 and EBS2 are in the loop sequence of Domain I, the corresponding IBS1 and IBS2 are in the 5Ј-exon immediately before the intron, with EBS1 and EBS2 base-pairing with IBS1 and IBS2, respectively. The EBS1 and EBS2 of the six introns are located at the same corresponding position of the different introns, but their sequences are not similar (Fig. 2).
RNA Splicing Activity of the Intron-To determine whether the six similar introns of dnaN and RIR genes are capable of RNA splicing in E. coli, each intron coding sequence was inserted in a plasmid expression vector to construct an intron-containing recombinant gene (Fig. 3A). Various lengths of the native exon sequences were included with the introns, because this is usually required for efficient RNA splicing. After transcription of the recombinant gene and RNA splicing in E. coli, total RNAs were isolated and reverse-transcribed, and the resulting cDNAs were PCR-amplified, using specific oligonuleotide primers. This RT-PCR procedure produces DNA fragments corresponding to the precursor RNA and the spliced RNA, whose predicted sizes are listed in Fig. 3A. The predicted sizes of different precursors vary according to the different sizes of the introns. The predicted sizes of the spliced products also differ with different introns, because the sizes of the native exons that were included with the introns differ with different introns.
The RT-PCR products corresponding to individual introns were resolved by agarose gel electrophoresis and shown in Fig. 3B. DNA bands corresponding to the precursor and spliced RNAs were identified initially by matching their observed sizes with the predicted sizes listed in Fig. 3A. DNA bands corresponding to the spliced RNAs were inserted into cloning plasmids and subjected to sequence determination (data not shown), which confirmed their identities as well as the predicted splice junctions. For the complete RIR-3 intron that encodes a maturase protein, only a band corresponding to the spliced RNA was observed (lane 2), indicating complete RNA splicing. To see whether the intron-encoded maturase is required for the RNA splicing, we deleted the maturase coding sequence except the first 15 codons. The resulting intron (RIR-3d) produced only the precursor band (lane 3), indicating that the maturase coding sequence was required for the splicing. We then expressed the maturase coding sequence as a freestanding gene on a separate plasmid in the same cell, and this restored the splicing activity to the RIR-3d intron (lane 4). This indicated that the maturase protein can act in trans, and the maturase coding sequence does not need to be a part of the intron.
For the RIR-2 intron that does not encode a maturase protein, only a precursor band corresponding to the precursor RNA was observed (lane 5), indicating a lack of RNA splicing. However, efficient splicing of the RIR-2 intron was observed when the maturase protein from the RIR-3 intron was co-produced in the same cell from a freestanding gene on a separate plasmid (lane 6). This indicated that the maturase protein from the RIR-3 intron acted in trans on the maturase-less RIR-2 intron to assist its RNA splicing. Similarly, the four introns (dnaN-1, dnaN-2, dnaN-3, and dnaN-4) of the dnaN gene do not encode a maturase protein, and each produced only a precursor band corresponding to the precursor RNA (lanes 7, 9, 11, and 13), indicating a lack of RNA splicing. However, all showed efficient RNA splicing when the maturase coding sequence from the RIR-3 intron was co-expressed in the same cell as a freestanding gene on a separate plasmid (lanes 8, 10, 12, and 14). This indicated that the maturase protein from the RIR-3 intron acted in trans on each of these maturase-less introns to assist their RNA splicing. To confirm that the maturase protein, not its coding sequence, acted on the maturase-less introns, we disrupted the maturase protein synthesis by introducing a frameshift mutation (single nucleotide insertion) at nucleotide position 1178 of the 1770-bp maturase gene, and this abolished the RNA splicing activity of all the maturase-less introns (data not shown).
In group II introns, the domain IV loop region is outside the catalytic core and generally not conserved. Among the six introns studied here, the domain IV loop region varies greatly in size (Fig. 2), but the terminal parts of this region show a high level of sequence conservation (Fig. 4A). To test whether the domain IV loop region is required for RNA splicing, we deleted from three introns (RIR-3, dnaN-3, and dnaN-4) the entire domain IV loop sequence except the five terminal nucleotides AATGG. The resulting deletion introns (RIR-3D2, dnaN-3D, and dnaN-4D) were tested for RNA splicing in comparison with the original introns (Fig. 4B). Unlike the original introns, none of the deletion introns showed RNA splicing whether in the presence or in the absence of the maturase from the RIR-3 intron, indicating that the domain IV loop region of these introns are required for the RNA splicing.

DISCUSSION
We have demonstrated for the first time that an intron-encoded protein could function as a general maturase for the RNA splicing of multiple maturase-less introns of different genes. This suggests that the RIR-3 intron-en-  3, 5, 7, 9, 11, and 13) or with (ϩ) a maturase from the RIR-3 intron co-expressed as a freestanding gene on a separate plasmid (lanes 4, 6, 8, 10, 12, and 14).
coded maturase can recognize certain common structural elements present in all the six introns. Previous studies of the bacterial Ll.LtrB intron and the yeast aI2 intron have identified intron-specific structural elements that are recognized by the intron-specific maturases encoded in these introns (8,19). They included a high affinity binding site at the beginning of the own coding sequence of the maturase located in the domain IV loop region of the intron, which forms an idiosyncratic structure named DIVa, in addition to secondary binding sites in the intron catalytic core. The six introns of the dnaN and RIR genes share ϳ80% sequence identity in the intron catalytic core (Fig. 2), which may provide common binding sites for the general maturase. They also show a high level of sequence conservation in a putative DIVa region corresponding to the beginning of the coding sequence of the maturase protein (Fig. 4A), although five of the six introns have none or only remnants of a maturasecoding sequence. This putative DIVa region constitutes nearly the entire domain IV loop region in the dnaN-4 intron or a part of the domain IV loop region in the other introns. The domain IV loop region was shown to be necessary for RNA splicing (Fig. 4B), which is consistent with a hypothesis that the putative DIVa region may function in maturase binding like in the well studied L1LtrB intron, and its conserved sequence may provide common binding sites for the general maturase to act on all the six introns.
A general maturase assisting multiple introns of different genes may suggest a novel and intron-mediated mechanism of coordinating gene expressions, in which the production or function of the general maturase could control the functional expression (transcript maturation) of both the RIR gene and the dnaN gene. The two genes are functionally related in nucleotide synthesis and DNA replication, with the dnaN gene encoding the sliding DNA clamp of DNA polymerase III and the RIR gene encoding a ribonucleotide reductase. The dnaN and RIR genes are expected to be functional in vivo despite their multiple introns, because each gene predicts a protein highly conserved with homolog proteins of other bacteria, and because no additional copy of the gene was predicted from the genome sequence. The RNA splicing activities of the dnaN and RIR introns were studied in E. coli; therefore it is not certain that our findings will apply in the native organism. We have not been able to study the RNA splicing in the native organism T. erythraeum, due to extreme difficulties of culturing this organism in the laboratory. It is also not known whether the general maturase acts on additional group II introns of other genes in this organism; therefore the above hypothesis of intron-mediated gene regulation remains to be proven.
The finding of a general maturase for multiple group II introns may also have implications on the evolution of nuclear spliceosomal introns. It is thought that group II introns have given rise to spliceosomal introns (2). A likely early step may be the emergence of a general maturase that assisted the RNA splicing of multiple introns, which would eventually lead to the evolution of a general splicing enzyme like the spliceosome. A general maturase might emerge more easily from a preexisting intron-specific maturase by recognizing conserved structures of the group II introns and not needing idiosyncratic structures of specific introns. The presence of multiple similar introns in the same organism, like the six similar introns in T. erythraeum, may facilitate this evolution process. Once the maturase from one intron can assist the RNA splicing of other introns, the maturase coding sequences of the other introns can be lost through mutation. The resulting maturase-less introns become dependent on the maturase-containing intron for their RNA splicing and therefore host gene function. This dependence shall prevent a loss of the maturase-containing intron, except when the maturase coding sequence becomes a freestanding gene. Freestanding maturase-like genes have been found in plants and thought to have derived from group II introns (20), although their biochemical function remains to be determined. A freestanding maturase gene has not been found in T. erythraeum.