A Novel Family of Sequence-specific Endoribonucleases Associated with the Clustered Regularly Interspaced Short Palindromic Repeats*

Clustered regularly interspaced short palindromic repeats (CRISPRs) together with the associated CAS proteins protect microbial cells from invasion by foreign genetic elements using presently unknown molecular mechanisms. All CRISPR systems contain proteins of the CAS2 family, suggesting that these uncharacterized proteins play a central role in this process. Here we show that the CAS2 proteins represent a novel family of endoribonucleases. Six purified CAS2 proteins from diverse organisms cleaved single-stranded RNAs preferentially within U-rich regions. A representative CAS2 enzyme, SSO1404 from Sulfolobus solfataricus, cleaved the phosphodiester linkage on the 3′-side and generated 5′-phosphate- and 3′-hydroxyl-terminated oligonucleotides. The crystal structure of SSO1404 was solved at 1.6Å resolution revealing the first ribonuclease with a ferredoxin-like fold. Mutagenesis of SSO1404 identified six residues (Tyr-9, Asp-10, Arg-17, Arg-19, Arg-31, and Phe-37) that are important for enzymatic activity and suggested that Asp-10 might be the principal catalytic residue. Thus, CAS2 proteins are sequence-specific endoribonucleases, and we propose that their role in the CRISPR-mediated anti-phage defense might involve degradation of phage or cellular mRNAs.

Clustered regularly interspaced short palindromic repeats (CRISPRs) together with the associated CAS proteins protect microbial cells from invasion by foreign genetic elements using presently unknown molecular mechanisms. All CRISPR systems contain proteins of the CAS2 family, suggesting that these uncharacterized proteins play a central role in this process. Here we show that the CAS2 proteins represent a novel family of endoribonucleases. Six purified CAS2 proteins from diverse organisms cleaved single-stranded RNAs preferentially within U-rich regions. A representative CAS2 enzyme, SSO1404 from Sulfolobus solfataricus, cleaved the phosphodiester linkage on the 3-side and generated 5-phosphate-and 3-hydroxyl-terminated oligonucleotides. The crystal structure of SSO1404 was solved at 1.6 Å resolution revealing the first ribonuclease with a ferredoxin-like fold. Mutagenesis of SSO1404 identified six residues (Tyr-9, Asp-10, Arg-17, Arg-19, Arg-31, and Phe-37) that are important for enzymatic activity and suggested that Asp-10 might be the principal catalytic residue. Thus, CAS2 proteins are sequence-specific endoribonucleases, and we propose that their role in the CRISPR-mediated anti-phage defense might involve degradation of phage or cellular mRNAs.
On the chromosome, CRISPR loci are flanked by a large number of cas (CRISPR-associated) genes encoding uncharacterized proteins. A comprehensive bioinformatic analysis of the CAS system in sequenced genomes resulted in a refined classification with 25 gene families and at least nine types of the cas operon organization (9,17). Eight CAS protein families have been predicted to possess nuclease activity; nine families have been characterized as putative RNA-binding proteins (RAMPdomain proteins), and two families have been predicted to pos-sess helicase and DNA/RNA polymerase activity (17). This analysis, combined with the data on the homology of some CRISPR spacer sequences to phage genes, led to the hypothesis that the CRISPRs and cas-encoded proteins comprise a system of defense against invading phages and plasmids and that this system might operate analogously to the eukaryotic RNA interference systems (17). The wide distribution of the CRISPR-CAS system among archaea and bacteria, its apparent importance for immunity of prokaryotes against infectious agents, and the predicted novel mechanism have recently made this system a subject of intense interest (3).
Two cas genes (cas1 and cas2) are always located near a CRISPR locus and are found only in species containing CRISPRs suggesting that these proteins play a central role in the CRISPR system (1,17). The members of the CAS2 superfamily are small, uncharacterized proteins (80 -120 residues), which belong to COG1343 and COG3512 groups of the COG protein classification system (18). CAS2 proteins contain several conserved sequence motifs, in particular an N-terminal motif that consists of a string of hydrophobic residues (a predicted ␤-strand), and typically ends with an aspartate (17). The CAS2 protein sequences show some similarity to the sequences of the VapD family of uncharacterized proteins that are functionally linked to the VapBC toxin-antitoxin (TA) operon (19). Based on the pattern of conserved amino acid residues, in particular the presence of a conserved aspartate after a predicted ␤-strand, and some functional clues on the TA systems, it has been hypothesized that both CAS2 and VapD might possess RNase activity (17).
Here we report for the first time the results of biochemical and structural characterization of a family of CRISPR-associated enzymes, the CAS2 family proteins from five prokaryotes.
We show that CAS2 proteins are endoribonucleases that are specific to single-stranded (ss)RNAs and preferentially cleave them within U-rich regions. The crystal structure of a representative CAS2 protein, SSO1404 from Sulfolobus solfataricus, was solved to a 1.6 Å resolution and revealed a ferredoxin-like fold with the double split ␤-␣-␤ motif, as well as the putative active site.

EXPERIMENTAL PROCEDURES
Protein Overexpression, Purification, and Site-directed Mutagenesis-The cloning of the genes encoding SSO1404 and other CAS2 proteins (SSO8090, TM1796, AF1876, MTH1083, and NE0845) into the modified pET15b was carried out as described previously (20). The proteins were expressed as a fusion with an N-terminal His 6 tag in Escherichia coli strain BL21 (DE3) and purified to more than 95% homogeneity using metal-chelate affinity chromatography on nickel affinity resin and gel filtration on a Superdex 200 26/60 column (Amersham Biosciences) as described before (20,21). Site-directed mutagenesis of SSO1404 was performed as described previously (21) using a protocol based on the QuikChange site-directed mutagenesis kit (Stratagene).
Preparation of RNA Substrates-The short RNA substrates (Table 1) were purchased from IDT. The oligonucleotides were 5Ј-end-labeled with [␥-32 P]ATP (6,000 Ci/mmol; Amersham Biosciences) and T4 polynucleotide kinase (PNK) (Fermentas) and then purified by denaturing PAGE (15% polyacrylamide, 8 M urea gel). The labeled oligonucleotides were eluted from the gel, precipitated with 2% LiClO 4 in acetone, washed with acetone, dried, and dissolved in diethyl pyrocarbonate-treated Milli-Q water. The long RNA substrates were synthesized using the Ambion T7 RNA polymerase MAXIscript transcription kit.  Enzymatic Assays-The reaction mixture for RNase assays (10 l) contained 0.1 M [ 32 P]RNA, 50 mM Tris-HCl (pH 8.5), 100 mM KCl, 5 mM MgCl 2 , 1 mM dithiothreitol, and 0.01-0.1 g of enzyme. The pH dependence of SSO1404 was characterized using three buffers: MES-K (pH 5.5 to 6.5), Tris-HCl (pH 7.0 to 9.0), and CAPS-K (pH 9.4 to 11.0). The reaction mixture was incubated at 37°C for the indicated period of time and quenched by the addition of equal volume of formamide loading buffer (80% formamide, 0.025% bromphenol blue, 0.025% xylene cyanol, and 10 mM EDTA (pH 8.0)). The reaction products were resolved by electrophoresis in 15% PAA, 8 M urea gels using TBE (10 mM Tris borate (pH 8.3), and 2 mM EDTA) as a running buffer. As nucleotide size markers, an imidazole ladder or a G-ladder produced by partial RNA cleavage by 2 M imidazole or RNase T1, respectively, was used (22,23). For the analysis of the RNA product 5Ј-end, after RNase reaction, RNA products were precipitated by 2% LiClO 4 , washed by acetone, dried, dissolved in Milli-Q water, and phosphorylated with [␥-32 P]ATP and T4 PNK using conditions for forward or phosphate exchange reaction according to the manufacturer's protocol (Fermentas). After the PNK reaction, RNA products were analyzed using denaturing 15% PAA, 8 M urea gels as described above for RNase assays. The reaction mixtures for DNase assays (40 l) contained 50 mM HEPES-K buffer (pH 7.5), 100 mM KCl, 5 mM MgCl 2 , 1 mM dithiothreitol, 0.3 g of DNA (doublestranded DNA) or 0.75 g of M13 DNA (ssDNA), and 1-4 g of enzyme. After 1 h of incubation at 37°C, the reactions were quenched by the addition of 6ϫ DNA loading dye (Fermentas) and analyzed on EtBr-stained 1% agarose gels.
Protein Crystallization and Structure Determination-SSO1404 crystals were grown using the hanging drop vapor diffusion method with the drops containing a mixture of 2 l of 10 mg/ml purified selenomethionine-incorporated SSO1404 protein and 2 l of reservoir buffer (0.2 M NaI, 20% w/v PEG 3350, and 2% v/v isopropyl alcohol). For diffraction studies, the crystals were stabilized with the crystallization buffer supplemented with 20% ethylene glycol as a cryoprotectant and flash-frozen in liquid nitrogen. A single crystal of selenomethionineincorporated SSO1404 was used to collect diffraction data at beamline 19-BM of the Structural Biology Center of the Advanced Photon Source (24) and was maintained at a temperature of 100 K. A single-wavelength anomalous diffraction dataset was collected at a wavelength of 0.9794 Å. Crystallographic data collection and model refinement statistics are summarized in Table 2. Reflection data were collected, indexed, integrated, and scaled with HKL-3000 (25).
A two-site selenium substructure was determined; the structure was phased by single-wavelength anomalous diffraction, and an initial model was built. All structure solution and initial model building was performed by HKL-3000, which is integrated with SHELXD, SHELXE, MLPHARE, DM, O, COOT, SOLVE, RESOLVE, and ARP/wARP (25)(26)(27)(28)(29)(30)(31)(32). The initial model was improved by iterative cycles of manual rebuilding in COOT, followed by maximum likelihood refinement with REF-MAC5 (33). In later stages of refinement, a multigroup TLS model generated by the TLSMD web server was used to further improve the model. The final model was validated using Molprobity (34), SFCHECK (35), and PROCHECK. The atomic coordinates and structure factors for SSO1404 have been deposited in the Protein Data Bank with the accession code 2i8e.

RESULTS AND DISCUSSION
Enzymatic Activity of CAS2 Proteins-To characterize the biochemical activity of CAS2 proteins, we cloned and purified six members of this family from different organisms as follows: SSO1404 and SSO8090 from S. solfataricus, AF1876 from Archaeoglobus fulgidus, TM1796 from Thermotoga maritima, MTH1083 from Methanobacterium thermoautotrophicum, and NE0845 from Nitrosomonas europaea. Given the prediction that CAS2 proteins might possess nuclease activity (17),  (36 or 39 nt). No nuclease activity was found against either of the DNA substrates, but all proteins degraded the ssRNAs (Fig. 1). With the tested ssRNA substrates, the CAS2 proteins generated a limited number (one to five) of products of various lengths (7-29 nt) indicating that they cleave ssRNAs endonucleolytically. A similar but not identical pattern of products was observed, and most cleavage sites contained one or two U (Fig. 1). This observation suggests that the CAS2 ribonucleases recognize similar RNA sequences but also display some difference in substrate preference. SSO1404 also showed detectable cleavage of a long model RNA substrate (the 304-nt transcript of the 5Ј-fragment of the mouse ␤-actin gene) and generated several products of various lengths (18 -200 nt), but the activity was lower than that against short oligoribonucleotides (data not shown). Thus, the CAS2 family proteins are ssRNA-specific endoribonucleases. Reaction Requirements and RNA Cleavage Products of SSO1404-SSO1404 exhibited RNase activity over a broad pH range (7.0 -10.0) with maximum activity at pH 8.5-9.0 ( Fig.  2A). Very little cleavage of RNA5 by SSO1404 (between U20 and C21) was observed in the absence of both monovalent and divalent cations, as well as in the presence of Na ϩ , K ϩ , or Mn 2ϩ (Fig. 2B). Cleavage between U9 and U10 was greatly enhanced by the addition of Mg 2ϩ . The Mg 2ϩ -dependent activity was further stimulated by K ϩ , whereas Na ϩ had inhibitory effect. Thus, SSO1404 requires Mg 2ϩ (5 mM), K ϩ (100 mM), and pH 8.5-9.0 for optimal cleavage of ssRNA.
RNases can be subdivided into two groups depending on the position of the cleavage of the phosphodiester linkage (36). Enzymes of the first class cleave the bond on the 3Ј-side (producing a 5Ј-phosphorylated second product) and include  numerous intracellular endo-and exoribonucleases. The enzymes of the second class (e.g. RNase A, RNase T1, and barnases) cleave the linkage on the 5Ј-side releasing products containing a cyclic 2Ј,3Ј-phosphodiester bond. The products of ssRNA cleavage by a representative CAS2 protein, SSO1404, were characterized using the T4 PNK-catalyzed reactions of RNA phosphorylation (at 5Ј-hydroxyl termini) and the phosphate exchange between the 5Ј-phosphate groups of the oligonucleotide substrate and ATP (37). After hydrolysis of the RNA5 substrate by SSO1404, the reaction products were incubated with PNK and [␥-32 P]ATP and analyzed by denaturing PAGE and autoradiography (Fig. 2C). Incubation with PNK did not produce a labeled 3Ј-end product (no second band on the gel), and the gel showed only one labeled band corresponding to the 5Ј-end product (Fig. 2C, lane 2). This suggests that the 3Ј-end product is phosphorylated on its 5Ј-end. The presence of this 3Ј-end product in the reaction mixture with SSO1404 was confirmed using the PNK-catalyzed phosphate exchange reaction between the RNA 5Ј-phosphate and [␥-32 P]ATP in the presence of ADP excess (38). This experiment produced two labeled products on the gel with the expected length of 9 and 30 nt (Fig. 2C, lane 3). Thus, SSO1404 and apparently other CAS2 enzymes are the 5Ј-phosphomonoester-producing endoribonucleases.
Endoribonuclease Activity of SSO1404 against Long CRISPR Substrates-Previous analyses of small RNAs produced in S. solfataricus and A. fulgidus revealed that the long CRISPR transcripts are processed into fragments with the size of one repeat and one spacer (62-75 nt) (39,40). It has been proposed that these short CRISPR RNAs are produced by an unknown cas RNase that would cleave within the repeat sequence producing intact spacers flanked with the repeat fragments (3). The demonstration of ssRNase activity of SSO1404 suggested the possibility that CAS2 could be the RNase that is responsible for processing CRISPR RNAs. Thus, we tested SSO1404 for the ability to cleave the 270-nucleotide-long CRISPR transcript of the S. solfataricus CRISPR cluster-2. The uniformly labeled long CRISPR RNA substrate was prepared using T7 RNA polymerase and the 270-bp DNA fragment of the 5Ј-end of the S. solfataricus CRISPR cluster-2 containing the upstream region, repeat-1, spacer-1, repeat-2, and a 27-nt fragment of spacer-2 as the template. If SSO1404 were a CRISPR-processing endoribonuclease, it would be expected to cleave this substrate within the repeat sequences and produce a minimal product containing one spacer flanked by two repeat halves (63-66-nt-long), as well as a series of products containing two or more repeat ϩ spacer units. SSO1404 showed detectable endoribonuclease activity with this long CRISPR RNA substrate, but it produced products that were smaller than expected (23, 27, and 45 nt long), as well as several longer fragments (65, 75, 95, 110, and 160 nt long) (Fig. 3A). This finding indicates that SSO1404 does not have the expected specificity toward CRISPR transcripts and thus is unlikely to be the CRISPRprocessing endoribonuclease.
SSO1404 Cleaves ssRNAs Preferentially within U-rich Regions-The substrate specificity of SSO1404 was further characterized using an extended set of shorter synthetic ssRNA substrates, as well as several dsRNAs. The RNA1 to RNA5 sub-strates were identical in sequence to the sense strands of repeat-1 (24 nt), repeat-2 (25 nt), and three spacers (1-3) of the S. solfataricus CRISPR cluster-2 (Table 1). SSO1404 cleaved all these substrates, but the highest activity was observed with RNA5, which was preferentially cleaved between U9 and U10 (Fig. 3B). The hydrolysis of this substrate proceeded to near completion with the formation of one main labeled product.
Eight ssRNA substrates (RNA7 to RNA10 and RNA14 to RNA17) corresponded to repeats and spacers from other CRISPR-containing organisms (S. thermophilus, T. maritima, and Methanococcus jannaschii), whereas RNA11, RNA12, and RNA13 were scrambled RNA substrates (Table 1). SSO1404 demonstrated endonucleolytic activity toward most of these substrates with the highest activity toward RNA9, RNA10, and RNA13 (Fig. 3C). The analysis of the cleavage sites showed that SSO1404 preferred U-rich sequences, and in many substrates (e.g. RNA9, RNA10, RNA12, and RNA13) the cleavage site was located between two Us ( Table 1). Examination of the computer-predicted secondary structures of the cleavable RNA substrates revealed that SSO1404 cleaves RNA substrates in predicted single-stranded regions (Fig. 3). No cleavage was observed with the dsRNA substrates prepared by annealing ssRNA substrates (RNA5, RNA25, and RNA26) with their complementary RNA chains and containing blunt ends or a twonucleotide 5Ј-overhang (not shown). Thus, SSO1404 is a singlestranded RNA-specific endoribonuclease.
Although several substrates shown in Fig. 3 (RNA2, RNA5, and RNA12) contain a potential SSO1404 cleavage site within the predicted loops, the enzyme cleaved these substrates only in close proximity to the 5Ј-or 3Ј-ends suggesting that it requires the presence of a free RNA end. To determine whether a stemloop is required as an entry site for SSO1404, we tested its activity against three truncated oligoribonucleotides (16 nt) that lack the potential to form stem-loops and contain the SSO1404 cleavage site from RNA5 (RNA18), RNA11 (RNA19), or RNA9 (RNA20). SSO1404 efficiently cleaved all three substrates at the same position as in the corresponding longer substrates (Table  1) indicating that the primary sequences of these short oligoribonucleotides were sufficient to direct site-specific cleavage by SSO1404, and no specific secondary structure was required. To determine the minimal size of the SSO1404 substrate, the 39-nt-long RNA5 was further trimmed to 16-nt RNA18, 13-nt RNA21, 10-nt RNA22, 8-nt RNA23, and 6-nt RNA24 (Table 1). SSO1404 efficiently cleaved the four longer substrates (RNA5, RNA18, RNA21, and RNA22) but showed no activity against RNA23 (8 nt) and RNA24 (6 nt) (Fig. 4). Thus, oligoribonucleotides as short as 10 nt can serve as SSO1404 substrates.
An ssRNA endoribonuclease activity with a preference for U-rich regions also has been observed in the abortive phage infection determinant AbiB from Lactococcus lactis and in mRNA interferases MazF-mt1 and MazF-mt6 from M. tuberculosis (41,42). The L. lactis AbiB prevented growth of the sensitive phage bIL170 through the selective degradation of phage mRNAs by endonucleolytic cleavage at U/U, A/U, and U/A sites (41). The mRNA interferases are toxin components of chromosomal TA modules that are abundant in free-living prokaryotes and induce reversible cell cycle arrest or programmed cell death in response to starvation or other stress conditions (43)(44)(45)(46). These enzymes have different mRNA cleavage specificities, and their expression in the cell causes the effective inhibition of protein synthesis leading to temporal cell growth arrest. In this regard, it might be relevant that U-rich and AU-rich regions have been identified upstream of the Shine-Dalgarno sequence in prokaryotic and phage mRNAs and shown to enhance translation (47,48). The CAS2 proteins are unrelated to mRNA interferases or AbiB but, as mentioned above, appear to be homologous to VapD, a component of a distinct class of TA systems. Thus, CAS2 and VapD might represent a novel group of mRNA-specific endoribonucleases.
Crystal Structure of SSO1404-To elucidate the molecular basis for the ssRNase activity of CAS2, we determined the crystal structure of SSO1404 at 1.6 Å resolution. The structure demonstrated that SSO1404 is a homodimer (Fig. 5A). This is consistent with the results of our gel filtration experiments, which showed that the native protein has a molecular mass of 23.7 Ϯ 0.6 kDa (predicted molecular mass 11.9 kDa). In the SSO1404 dimer, two monomers are joined by their ␤-sheets, and the ␣-helices are exposed on the surface. In the SSO1404 monomer, the four ␤-strands form an antiparallel ␤-sheet, and the ␣-helices are packed on one side, and the overall structure may be described as a double-split ␤-␣-␤ fold. In addition, the last ␤-strand (␤5) of each monomer interacts with the ␤-sheet of the other monomer creating two-joint, five-strand, antiparallel ␤-sheets (Fig. 5A).
Both Dali and VAST searches also recognized various proteins with the ferredoxin-like fold as being structural relatives of each of the three CAS2 structures with similar, moderate scores and r.m.s.d. values (supplemental Table 1), suggesting that CAS2 proteins comprise a new superfamily within the ferredoxin-like fold. Indeed, this is how the available CAS2 structures have been classified in the SCOP data base.
The ferredoxin-like domains consist of one of the most populated protein folds, with numerous structural and functional derivatives (53). In particular, this fold is present in numerous RNA-binding proteins, including the RNA-binding domain, the anticodon-binding domain of PheRS, ribosomal proteins S6 and S10, and also the prominent components of the CAS system, the RAMP superfamily proteins, which possess a duplicated ferredoxin-like domain (17,54). However, to our knowledge, no nucleases with the ferredoxin-like fold have been characterized with the possible exception of the IS200 transposase of S. solfataricus (SSO1474, Protein Data Bank code 2f5g). Like many ribosomal proteins (55), SSO1404 exposes several aromatic and hydrophobic residues to the solvent (Ile-32, Tyr-34, Ile-75, Ile-84, Val-85, Ile-86, Phe-80, and Tyr-88), and some of these residues are conserved in large subsets of the CAS2 family proteins (supplemental Fig. 1), suggesting their involvement in the interaction with the bases of RNA. The extended loop ␣2-␤4 of SSO1404 also could be a candidate for interaction with RNA.   3, 5, 7, 9, and 11) or in the presence (lanes 2, 4, 6, 8, 10, and 12) of SSO1404 (30 g/ml). Reaction products were processed and visualized as described under "Experimental Procedures." Lanes T1 and Im represent the products of the RNA5 cleavage by RNase T1 and 2 M imidazole (pH 7.0), respectively.
Mutational Analysis of SSO1404 and the Potential Catalytic Site-The structure of the SSO1404 dimer revealed several major cavities or grooves located at the interface of the two monomers, which might represent potential catalytic sites (Fig. 6A). The most prominent cavity consists of the deep cleft between two monomers accommodating the side chains of conserved Asp-10, which is bracketed by two ␣2-␤4 loops (Fig. 6A). To identify the active site residues involved in RNA cleavage, we performed alanine replacement mutagenesis of the conserved residues of SSO1404. A multiple sequence alignment of CAS2 proteins (17) identified several conserved amino acids ( Fig. 7; supplemental Fig.  1). The single conserved aspartate, Asp-10, was of special interest as the putative principal catalytic residue. In the SSO1404 structure, Asp-10 is located at the end of the first ␤-strand and is preceded by the conserved tyrosine Tyr-9. Several other partially conserved residues are spatially close to Asp-10, including Arg-17, Arg-19, and Arg-31. Alanine replacement mutagenesis of SSO1404 revealed that Asp-10, Arg-31, and Phe-37 are strictly required for the catalytic activity, whereas Y9A, R17A, and R19A showed very low activity (Fig. 8). The T12A and D65A mutants showed an ϳ2-fold drop in activity, whereas the other mutant proteins exhibited slightly reduced (Q33A, Y34A, and S35A) or wildtype level (D13A, D14A, N18A, and R67A) activity.
In the SSO1404 structure, five residues that are important for activity (Tyr-9, Asp-10, Arg-17, Arg-31, and Phe-37) are located in the long cavity formed by the ␣1 helix on one side, the ␤2 and ␤3 strands on the other side, and the ␤1 strand at the bottom (Fig. 6B). SSO1404 requires a divalent metal cation for activity, Mg 2ϩ being the optimal metal (Fig.  2B). In most known RNases, two or three conserved acidic residues are critical for catalysis and are involved in the coordination of one or two metal cations (Mg 2ϩ or Mn 2ϩ ), which activate a nucleophilic water molecule for hydrolysis of the phosphodiester bond or stabilize the transition state in cleavage reactions (56). The structures of SSO1404 and two other CAS2 proteins (TT1823 and PF1117) revealed no metal atoms bound to the protein. Nevertheless, given that SSO1404 required the addition of Mg 2ϩ or Mn 2ϩ for activity and the conserved Asp-10 was critical for catalysis, we suggest that this residue is involved in the coordination of the metal cation in SSO1404 and other CAS2 proteins. Interestingly, the side chains of Asp-10 in the two monomers are oriented toward the active site of the other monomer and are positioned 6.5 Å apart (Fig. 6B) suggesting that these residues do not form two independent active sites. In addition, these aspartates are located inside a deep, narrow cleft making them inaccessible for direct interaction with the RNA substrate (Fig. 6A). Therefore, we propose that the side chains of the two Asp-10 bind one Mg 2ϩ atom, which might coordinate a nucleophilic water molecule. As shown here (Fig. 2C), SSO1404 generates products terminating in a 3Ј-hydroxyl and a 5Ј-phosphate suggesting that, like in the E. coli RNase E, the reaction involves an attack of a water molecule on the susceptible phosphodiester bond followed by scission of the 3Ј-oxygen-phosphorus bond (57). Therefore, the catalytic mechanism of SSO1404 probably involves the in-line nucleophilic attack on the scissile phosphate by an activated hydroxyl group generated from a water molecule that is coordinated to magnesium. The SSO1404 D65A protein had a 2-fold reduced activity (Fig. 8). This conserved aspartate is located on the long ␣2-␤4 loop (Fig. 6, A and B) that varies in length among CAS2 proteins and accommodates several partially conserved charged residues (Glu-64, Asp-65, Glu-66, and Arg-67 in SSO1404) (Fig. 7). Based on the structure of the SSO1404 dimer (Fig. 6, A and B), we suggest that the ␣2-␤4 loops are responsible for the recognition of RNA substrates and might determine the substrate selectivity of different CAS2 proteins. We propose that like the E. coli RNase E S1 domain (58), the SSO1404 ␣2-␤4 loop might function to clamp down the RNA substrate in the active site orienting its phosphate backbone for nucleophilic attack by the magnesium-coordinated activated water molecule. Conclusions-This work is the first step toward a comprehensive biochemical and structural characterization of CAS proteins. Recent bioinformatic analyses suggest that the CRISPR and CAS proteins might use several dissimilar mechanisms to abrogate phage infection and that at least one of these mechanisms could resemble eukaryotic RNA interference (3,17). This study shows that SSO1404 and other CAS2 proteins represent a novel family of endoribonucleases, which possess a ferredoxin-like fold and are specific to ssRNA substrates. SSO1404 cleaves ssRNAs preferentially within U-rich regions and, in this respect, resembles the phage abortive infection endoribonuclease AbiB from L. lactis (41). A similar mechanism of selective degradation of phage transcripts might be proposed for SSO1404 and other CAS2 proteins as well. Under this hypothesis, CAS2 would be the functional analog of the eukaryotic slicer nuclease, a function that is performed by the PIWI domains that are unrelated to CAS2 (59). Considering the apparent mechanism of CRISPR-associated anti-phage defense that involves integration of sequences homologous to fragments of phage mRNAs into the CRISPR loci, the CAS2 endoribonuclease activity might contribute to this process as well. Another possible role of CAS2 proteins in antiphage defense might be associated with the global inhibition of translation by mRNA cleavage, a mechanism that has been proposed for RelBE and several other TA systems (MazEF, PemIK, and ChpBIK) that contain RNase components (mRNA interferases) (45,60,61). These TA systems play important roles in stress response to nutritional limitations or DNA damage, and their expression results in growth arrest or programmed cell death (43)(44)(45). Moreover, it has been shown that the E. coli MazEF TA module prevents multiplication of the phage P1 by promoting cell death (62). A similar mechanism of anti-phage response appears to be a possibility FIGURE 6. Potential active site of SSO1404. A, surface representation (two orientations) of the SSO1404 dimer showing the potential active site: the deep cavity between two monomers harboring two conserved Asp-10 (colored red). One of the monomers is colored in cyan, and the other is in green. Residues colored orange (Gln-63, Glu-64, and Asp-65) are located on the two ␣2-␤4 loops, which are predicted to be involved in RNA binding. Other conserved residues important for activity are shown in yellow (Tyr-9, Arg-17, Arg-19, Arg-31, and Phe-37). B, stereo view of the SSO1404 dimer showing the positions of the residues important for activity (Tyr-9, Asp-10, Arg-17, Arg-19, Arg-31, and Phe-37). One of the monomers is colored in cyan, and the other is in green. The orientation of the SSO1404 dimer in B corresponds to the right model of A. Note the proximity of the two principal catalytic Asp-10 residues in the dimer (6.5 Å). FIGURE 7. Structure-based sequence alignment of SSO1404 and selected CAS2 proteins. Residues conserved in all CAS2 proteins are highlighted in black, highly conserved residues in dark gray, and similar residues in light gray. The secondary structure elements derived from the structure of SSO1404 (Protein Data Bank code 2i8e) are shown above the alignment. The asterisks designate the residues important for the catalytic activity of SSO1404. The compared proteins are as follows: SSO1404 (Q97YC2), SSO8090 (Q97Y85), TT1823 from T. thermophilus (Q746F4), PF1117 from P. furiosus (Q8U1T8), AF1876 from A. fulgidus (O28403), MTH1083 from M. thermoautotrophicum (O27155), TM1796 from T. maritima (Q9X2B6), and NE0845 from N. europaea (Q82W51). The sequences were aligned using both sequence and structural information by 3DCOFFEE (63), and the figure was generated using TEXSHADE (64).
for CAS2, especially in light of the previously described relationship with VapD, an uncharacterized protein that is functionally linked to the VapBC toxin-antitoxin system (17). S. solfataricus and many other CRISPR-containing organisms have at least two CAS2 proteins that might have different sequence specificities and could target distinct sets of mRNAs.