Crystal structures of CRISPR-associated Csx3

Cas10 is the signature gene for type-III CRISPR-Cas surveillance complexes. Unlike type-I and type-II systems, type-III systems do not require a PAM, and target nascent RNA associated with transcriptionally active DNA. Further, target RNA recognition activates the cyclase domain of Cas10, resulting in the synthesis of cyclic oligoadenylate (cOA) second messengers. These second messengers are recognized by ancillary Cas proteins harboring CARF domains, and regulate the activities of these proteins in response to invading nucleic acid. Csx3 is a distant member of the CARF domain superfamily previously characterized as a Mn2+ dependent deadenylation exoribonuclease. However, its specific role in CRISPR-Cas defense remains to be determined. Here we show Csx3 is strongly associated with type-III systems and that Csx3 binds cyclic tetra-adenylate (cA4) second messenger with high affinity. Further, Csx3 harbors cyclic oligonucleotide phosphodiesterase activity that quickly degrades this cA4 signal.  In addition, structural analysis identifies core elements that define the CARF domain fold, and the mechanistic basis for ring nuclease activity are discussed. Overall, the work suggests Csx3 functions within CRISPR-Cas as a counter balance to Cas10 to regulate the duration and amplitude of the cA4 signal, providing an off-ramp from the programmed cell death pathway in cells that successfully cure viral infection.

Cas10 is the signature gene for type-III CRISPR-Cas surveillance complexes. Unlike type-I and type-II systems, type-III systems do not require a PAM, and target nascent RNA associated with transcriptionally active DNA. Further, target RNA recognition activates the cyclase domain of Cas10, resulting in the synthesis of cyclic oligoadenylate (cOA) second messengers. These second messengers are recognized by ancillary Cas proteins harboring CARF domains, and regulate the activities of these proteins in response to invading nucleic acid. Csx3 is a distant member of the CARF domain superfamily previously characterized as a Mn 2+ dependent deadenylation exoribonuclease. However, its specific role in CRISPR-Cas defense remains to be determined. Here we show Csx3 is strongly associated with type-III systems and that Csx3 binds cyclic tetra-adenylate (cA 4 ) second messenger with high affinity. Further, Csx3 harbors cyclic oligonucleotide phosphodiesterase activity that quickly degrades this cA 4 signal. In addition, structural analysis identifies core elements that define the CARF domain fold, and the mechanistic basis for ring nuclease activity are discussed. Overall, the work suggests Csx3 functions within CRISPR-Cas as a counter balance to Cas10 to regulate the duration and amplitude of the cA 4

signal, providing an off-ramp from the programmed cell death pathway in cells that successfully cure viral infection.
CRISPR-Cas is a prokaryotic adaptive immune system that is now well known for its repurposed uses in genome editing. Within the prokaryotic domains of life where these systems evolved, a multitude of various classes, types and subtypes are now recognized. In each case, however, the basic mechanism of adaptive immunity is generally described in three distinct stages. Briefly, in stage 1 (adaptation) short fragments of invading DNA known as spacers are inserted into the CRISPR locus. During stage 2 (crRNA biogenesis) the CRISPR locus is transcribed and cleaved into short CRISPR RNAs (crRNAs), and in stage 3 (target interference) crRNA is assembled into nucleoprotein surveillance complexes that survey the cell for complementary DNA (type-I, type-II) or RNA (type-III). Recognition of target DNA or RNA then activates nucleases that destroy the invading nucleic acid, providing immunity. Further, in type-III systems, which recognize RNA associated with transcriptionally active DNA, not only is the RNA degraded, but the DNA as well (1,2).
This process is wholly dependent on a wealth of CRISPR associated (Cas) protein machinery. Cas1, Cas2 and Cas4 are generally involved in adaptation (spacer acquisition), Cas6 frequently in crRNA biogenesis, and diverse groups of type and subtype dependent proteins are utilized in the various stage 3 (target interference) surveillance complexes. Class 1 systems generally utilize multiple Cas proteins to assemble their respective surveillance complexes. Among class 1 systems, the type-I and type-III systems are the most common, and can be differentiated by the presence of Cas3 and Cas10, respectively. Class 2 systems, on the other hand, generally utilize a single large protein subunit to assemble the surveillance complex, and among the class 2 systems, the type-II systems which utilize Cas9 are the most common. In addition, a number of "ancillary" proteins not directly implicated in the above processes are also clearly involved. Among these, proteins containing the CRISPR associated Rossmann Fold, or CARF domain (3) are central to this work.
In 2011 Lintner et al. studied two proteins with N-terminal CARF domains, Csa3 and Csx1, concluding these dimeric CARF domain proteins contained a single ligand binding site that spanned the dimer interface, and that the activities of these proteins would be regulated by a 2-fold (pseudo-) symmetric signal, potentially a cyclic oligonucleotide (4). This hypothesis was subsequently echoed and expanded upon by others, as they identified HEPN nuclease domains in Csx1 and Csm6 similar to those in toxin-antitoxin systems, and suggested the unknown signal might activate these non-specific RNases. Non-specific degradation of cellular mRNAs by Csx1 and Csm6, in turn, are expected to deplete the cellular mRNA pool and inhibit protein synthesis. This, in turn, may result in the onset of cell dormancy until the infection can be cured, or even induce programmed cell death (3,5).
In 2015 Yan et al. demonstrated  manganese  dependent 3'-deadenylase activity for yet another, apparently unrelated, ancillary Cas protein known as Csx3 (6). They also determined crystal structures of Csx3 in the free form (3WZG), in the presence of Mn 2+ (3WZH), and in complex with a single stranded RNA fragment (3WZI). Interestingly, the 4-base RNA fragment was found distal from conserved active site histidines on the opposite face of this single domain protein. Topuzlu et al. then recognized Csx3 as a distant member of the CARF domain superfamily, and that the 4base RNA fragment was bound in a nearly circular fashion in the putative regulatory binding site, suggesting a cyclic 4-base RNA as a regulatory ligand for CARF domain proteins (7).
In beautiful work, this hypothesis was then confirmed by both Kazlaukiene et al. (8) and Niewohner et al. (9). They showed that when type-III surveillance complexes bind target RNA, the Cas10 cylcase domain synthesizes small oligoadenylates, including cyclic tetra-and hexa-adenylate (cA4 and cA6). Further, they also showed these products are recognized by the N-terminal CARF domains of Csx1 and Csm6, where they do indeed activate latent RNase activity within their C-terminal HEPN domains, consistent with the hypothesis that these cOA signals promote cell dormancy, or in the case that the cOA signal is not attenuated, programmed cell death.
Thus, what is the end fate of this cOA; are cellular cOA levels controlled reversibly? The answer to this conundrum was recently elucidated by Athukoralage et al., who found that some archaea express a "ring nuclease" that degrades the cA4 signal into two molecules of A2 (10). Interestingly, this ring nuclease, christened "CRISPR ring nuclease 1" (Crn1), also utilizes a CARF domain for cA4 recognition, but with catalytic residues installed to slowly degrade the cA4 signal.
Overall, these advances in understanding the role of cyclic oligonucleotide signals in CRISPR-Cas prompted us to further investigate Csx3 and its potential interaction with the second messenger cA4.

Results
Csx3 associates with Type-III CRISPR-Cas Systems -In early seminal work on CRISPR-Cas, Haft et al. identified 45 different CRISPR associated (cas) gene families (11). These included core cas genes common to multiple CRISPR types, as well as modular groups of cas genes now known to act in concert. However, clear contextual patterns for some cas families were elusive, and these cas genes were assigned to CRISPR subtype X, or csx. Archaeoglobus fulgidus (DSM4304) possesses both Type-IA and Type-IIIB CRISPR-Cas systems, with csx3 clearly embedded within the Type-IIIB cassette (12). And while it has been noted that type-III systems may include ancillary csx3 genes (13), it is not clear if the reverse is true, i.e., that csx3 is most commonly associated with type-III systems, and if so, in what context. We thus identified putative AfCsx3 orthologs using a protein-protein BLAST search (14) against the non-redundant database. The top 46 unique sequences ranged from 84 to 113 residues in length and showed expect values < 6 x 10 -17 with greater than 45% sequence identity to AfCsx3 over at least 75% of the query. Sequences below this threshold included larger, multi-domain proteins that, while containing a Csx3-like domain, are not clear homologs and thus were not included. The sequence data was present in different forms, coming from 17 complete genomes, 7 scaffolds and 23 contigs. While the genomic context was not always clear, especially within contigs, csx3 was present in a type-IIIA cassette in 14 cases and in a type-IIIB cassette in 13 other cases. In just a single case was csx3 found outside a type-III system, where it was instead adjacent to type-I cassette. In this case, csx3 was at the extreme 5'-end of the contig, and we were unable to analyze the neighboring upstream sequence.
When the genes immediately up-or downstream of csx3 were surveyed, members of the csm6/csx1 family of CARF domain HEPN RNases were the most common, followed by cmr6, cmr4 and cas10 (Supplemental Figure 1). In contrast, except for a lone csa8a gene (type-IA), an immediate type-I or type-II neighbor was not observed. These single domain Csx3 orthologs thus show a clear association with type-III CRISPR-Cas systems. Because type-III (Cas10) surveillance complexes synthesize cOA signals when bound to target RNA, the association of Csx3 with type-III systems is consistent with the proposed recognition of cA4.
Conserved sequence motifs in Csx3 -We next looked for conserved sequences motifs in AfCsx3. A multiple sequence alignment performed with Clustal Omega (15) identified twelve strictly conserved residues (Supplemental Figure 1). These include two isolated residues, Pro 33 and His 80 , and three short motifs; GR(G/A)P(L/F)W 50 , LxHxxH 60 and DPR 71 . When mapped to the AfCsx3 structure, His 80 and the LxHxxH 60 motif localize to the RNase active site identified by Yan et al. (6). In contrast, the GR(G/A)P(L/F)W 50 and DPR 71 motifs map to the tetra-base RNA binding site on the opposite face of the Csx3 dimer (6), and are structurally equivalent to the conserved motifs previously identified in the CARF domains of both Csa3 and Csx1 (4). Thus, the putative cA4 binding site is strictly conserved among this group of Csx3 orthologs.
Csx3 binds cA4 with nanomolar affinity -The presence of Trp 50 among the conserved residues [GR(G/A)P(L/F)W 50 ] in the distal RNA binding site suggested intrinsic tryptophan fluorescence might be used to monitor ligand binding at this site. We utilized fluorescence lifetime spectroscopy for this purpose, taking advantage of an inhouse time-resolved fluorescence spectrophotometer to collect subnanosecondresolved fluorescence emission waveforms. This approach clearly demonstrates a very tight interaction between Csx3 and cA4 ( Figure 1, Supplemental Figure 2). We measure an apparent Kd for cA4 of 55 ± 4.5 nM. Notably, this value is less than the concentration of the Csx3 dimer (150 nM) used in the assay, explaining the near stoichiometric (linear) signal at cA4 concentrations below 150 nM. Ideally, the assay would utilize Csx3 concentrations below the Kd, but the sensitivity of the assay does not currently allow this. The apparent Kd thus provides only an upper limit, and could be an underestimate of the true affinity. For perspective, a single cA4 molecule per prokaryotic cell, approximate diameter of 1 µm, gives a concentration of ~ 100 nM. The affinity for cA4 is thus highly relevant, allowing recognition even at concentrations corresponding to a single molecule per cell. Clearly, Csx3 harbors a physiologically relevant cA4 binding site.
We next asked how the affinity for cA4 compares to the affinity for one of the poly-A tailed oligonucleotides originally utilized by Yan et al. For this, we chose the shortest of these, 5'-GGGAAAAGAAAAAA-3' [Supplemental Table 1, "A6" of Yan et al. (6)]. Here, the Csx3 fluorescence lifetime binding assay reveals a much lower affinity, giving a Kd = 1.9 ± 0.2 μM (Figure 1, Supplemental Figure 2). The significantly lower affinity for "A6" provides further evidence that cA4 is indeed a physiologically relevant ligand.
The Csx3 fold -To elucidate the structure of Csx3/cA4 complex and conformational changes in in the Csx3 homodimer upon binding, we crystallized Csx3 in the presence of cA4. Csx3 crystallized in space group I222 with a single subunit in the asymmetric unit, and usable diffraction data to 1.8 Å resolution. The structure unexpectedly revealed a 7 th C-terminal β-strand not present in previously modeled structures of AfCsx3 that runs anti-parallel to the first strand ( Figure 2). Further, the additional β-strand is preceded by a C-terminal α-helix (α3) that while present in the previously determined RNA bound structure (3WZI), is absent from the earlier apo-(3WZG) and Mn 2+ -bound (3WZH) structures (6). Thus, in contrast to previous crystal structures, the structural core of the Csx3 protomer is a mixed 7-stranded β-sheet with β7↑-β1↓-β2↑-α1-β3↑-α2-β4↑-β5↓-β6↑-α3 topology, in which the α-helices form right-handed crossovers between the indicated β-strands (α3 connects β6 to β7). This places α1 and α2 on one face of the sheet, and α3 on the opposite side. In the context of the dimer, α1 and α2 are buried at the subunit interface, while α3 is surface exposed.
It was unclear whether the presence of the additional β-strand was due to the higher resolution of the data set, or whether the Cterminal tail is a dynamic structural element. To address this, we downloaded the 3WZI, 3WZG and 3WZH models and structure factors from the protein data bank. In addition to these 3 models, we generated 3 additional models in which coordinates for residues 98-104 from our superposed I222 structure were included. For 3WZI, where the C-terminal tail is modeled, we also deleted the corresponding residues (98-104) of the original structure. For each of the six models, the bulk of the model is already well refined, and the only difference is whether the I222 β7 tail is absent or present. We then refined each of the 6 structures against the corresponding set of structure factors. In each case, the resulting R-factors were significantly better (lower) for the model lacking β7. Specifically, the respective Rwork/Rfree values for 3WZG, 3WZH and 3WZI in the absence of the I222 tail are 20.57%/23.63%, 21.71%/25.69%, 20.97%/22.81%, respectively, but, when the I222 tail is included in the refinement, the Rfactors rise to 21.20%/24.66%, 22.61%/26.74% and 24.45%/26.29%. The presence of the β7 tail is thus inconsistent with the observed structure factors. In addition, the resulting maps also lack density for the I222 tail, and when the I222 tail is in included in the models, instead shows strong negative difference density. This clearly suggests the C-terminal tail is indeed mobile, and while present as a seventh beta-stand in the new I222 structure reported here, adopts alternate conformations in the previous structures.
Further examination of this I222 structure revealed that β7 is anchored not only by hydrogen bond interactions as it runs antiparallel to β1 of the same subunit, but that it is also stabilized by a parallel β-strand interaction with β6 of a neighboring subunit ( Figure 3). Interestingly, this results in formation of an extended, inter-subunit βsheet running the length of the crystal. Visual observation and PISA interface analysis software (16), however, indicate this intersubunit crystal contact is insufficient to support formation of a stable protein interface in solution. While such open symmetry is common in filamentous protein assemblies, it's atypical for enzyme complexes, which more commonly show closed point group symmetry. Regardless, the dynamic nature of the C-terminal tail and its ability to mediate inter-dimer interactions is reminiscent of 3D domain swapping (17) and may indicate the ability of β1, β6 and the Cterminal tail to mediate additional proteinprotein interactions within the cell.
Although Csx3 was co-crystallized with cA4, density for cA4 was not observed in the RNA binding pocket, or elsewhere. While Csx3 is a Mn 2+ dependent deadenylase, our crystallization conditions did not contain added divalent cations. Thus, given the extreme affinity of Csx3 for cA4 and the previous structure of Yan et al. with an unidentified RNA fragment (3WZI), this was unexpected.
On the other hand, crystallization conditions did not include chelators and trace metals could have enabled cA4 degradation by the RNase active site. Alternatively, self-regulating metal independent ring nuclease activity has been seen for select CARF domains (18)(19)(20), suggesting Csx3 might also harbor metal independent, cA4 specific ring nuclease activity at the cA4 binding site, as opposed to the metal dependent RNase active site present on the opposite face of the protein.
Ring nuclease activity -For these reasons, we examined the ability of Csx3 to degrade cA4. The activity assay utilized thin layer chromatography to separate unlabeled cA4 (Biolog Life Science/Axxora) from potential degradation products. Using this assay, we observed unexpectedly high levels of ring nuclease activity upon the addition of 200 μM Mn 2+ (Figure 4), where the thin layer chromatography was consistent with diadenylate (A2) as the major product (10). At times trace activity levels were observed in samples lacking added Mn 2+ , perhaps due to low-level copurification of Csx3 with divalent cations, and potentially explaining the lack of bound cA4 in the crystal structure. In general, this activity was inhibited upon addition of EDTA. Importantly, when the progress curve in the presence of 200 μM Mn 2+ was modeled as first order exponential decay, the average observed rate constant (kobs) from 5 independent experiments was 4.95 ± 0.43 min -1 . For perspective, the rate constant for CRISPR ring nuclease Crn1 identified by Athukoralage et al. in 2018 is 0.089 min -1 (21,22), more than an order of magnitude slower than Csx3 in the presence of Mn 2+ . From this perspective Csx3 demonstrates robust cyclic oligonucleotide phosphodiesterase, or "ring nuclease" activity.
This suggests cA4 is indeed a physiologically relevant substrate and that Csx3 functions within CRISPR-Cas to regulate the amplitude and duration of the cA4 signal.

Discussion
Defining elements of the Cyclic oligoAdenylate Recognition Fold (CARF) -With confirmation that Csx3 does indeed bind cyclic tetra-adenylate with high affinity, we can now compare and contrast the structure of this more distant member of the CARF domain superfamily with the canonical CARF domain fold. This is well represented by the relatively simple S. solfataricus Csa3 CARF domain (4), which adopts the classic dinucleotide binding domain fold with one simple modification; the right handed helical crossover between β5 and β6 is lost. The two β-strands are instead connected by a reverse turn, with β6 running antiparallel rather than parallel to β5. This modification has significant consequences. As Csa3 forms the homodimer, the helices connecting β4 and β5 in each subunit pack against and parallel to each other at the center of the dimer interface, with each assuming the place of the "missing" helix in the neighboring subunit. The β5-β6 reverse turn and loss of the connecting helix is thus responsible for creation of the CARF domain dimer interface ( Figure 5). And because dimer formation is inherent to recognition of the 2-fold symmetric cOA signals, this is a defining characteristic of the CARF domain fold. In short, the β4↑-α4-β5↑-β6↓ supersecondary structural element in the Csa3 CARF domain lies at the heart of the fold. Critically, these structural elements in Csa3 (Csm6 and Csx1 as well) are topologically analogous to the β3↑-α2-β4↑-β5↓ elements in Csx3, and within the Csx3 dimer they play the same central role at the dimer interface as the cognate elements in Csa3 and other CARF domain proteins.
As the CARF domain acronym (CRISPR Associated Rossmann Fold) went into use before the ligands for this domain were fully identified, it does not well convey the function of this fold. Now that it is understood to bind small cyclic oligonucleotides, it is tempting to give it a more meaningful name. However, the CARF acronym is now in such wide use that renaming it could be difficult. We note however, that this domain clearly serves as a Cyclic oligo-Adenylate Recognition Fold, and this mnemonic could help some to remember its function. One should keep in mind, however, that it also recognizes small linear polynucleotides (6), and it is difficult to imagine that nature has not also adapted this fold for the recognition of other bases.
The C-terminal tail -The previous structures of Csx3 by Yan et al. show a disordered C-terminal tail. In the structure reported here, the C-terminal tail is instead captured in an ordered conformation. The structure shows a well-defined C-terminal αhelix that traces a path along the "back side" of the subunit to position the extreme Cterminus to add an additional β-strand along the outside edge of the first β-strand ( Figure  2). Thus, we see that the C-terminal tail not only has a propensity to form these secondary structural elements, but to utilize them, especially β7, to interact with other secondary structural elements within Csx3. However, the dynamic nature of the tail suggests that within the cell, the C-terminal tail might instead be utilized to interact with other cellular components, potentially as a member of a larger complex that might modulate the activity of Csx3. Importantly, in this conformation both β1 and β7 are available for intersubunit interactions.
Recognition of cA4 -The core β3↑-α2-β4↑-β5↓ structural motif is functionally important as well. The conserved GR(G/A)P(L/F)W 50 and DPR 71 motifs identified above lie in the β3α2 and β4β5 loops, respectively, and play a significant role in construction of the Csx3 RNA binding pocket ( Figure 5). Similarly, conserved residues are also found at these positions in Csa3 and Csx1, where they are also involved in construction of the conserved ligand binding site (4).
Additional insight into recognition of cA4 by Csx3 can be gained by re-examining the RNA density in the 3WGI structure determined by Yan et al (6). The 3WGI crystals were grown in the presence of a 37 nucleotide single-stranded RNA molecule, but Yan et al. report that the RNA was degraded over the time course for crystal growth. The 4 ordered nucleotides in the CARF domain binding site were thus modeled as a linear 4-base RNA fragment.
However, perhaps because the density arises from the presence of RNA fragments bound in overlapping conformations with different register, the electron density map clearly shows a cyclic ring of electron density. This suggested the density might be used to inform efforts to model the Csx3/cA4 interaction. Indeed, when we build cA4 into this density, our minimally refined cA4 bound model gives Rwork and Rfree values of 19.2 and 22.4% respectively, which compares very favorably to the original model with the linear 4-base RNA fragment (20.7/23.5%). Thus, while the 3WGI structure does not stem from crystals of Csx3 in complex with cA4, the ring like density is well modeled by cA4 ( Figure 6).
As expected, this model shows the conserved GR(G/A)P(L/F)W 50 and DPR 71 motifs in the β3α2 and β4β5 loops play a central role in cA4 recognition. In addition to these elements, the β2α1 loop is also prominent, corresponding to the conserved β1α1 loop of Csa3. Analysis of the modeled interaction shows that cA4 buries greater than 800 Å 2 of protein surface. Base A1 (and A3 in the second subunit) is nestled in a hydrophobic pocket between Arg 46 , Tyr 68 and Pro 70 , with potential hydrogen bonds from the main chain NH of Gly 45 , the His 15 side chain and Arg 46 to N1, N6 and N7 of the adenine ring, respectively. Base A2 (and A4) stacks on top of Pro 48 and the edge of Trp 50 , with N6 donating a hydrogen bond to the main chain carbonyl of Ile 22 . The phosphate groups in this model lack substantial interactions, though Arg 46 and Arg 71 could seemingly interact strongly with a relatively minor reorientation of their side chains. Notably, the phosphate groups adjacent to the Arg 46 side chains are elevated above the plane of the cA4 ring while those adjacent to Arg 71 are more deeply buried. Finally, strictly conserved Asp 69 hydrogen bonds to the 2'-OH of the ribose of A2 (and A4).
Structural basis for ring nuclease activity -Yan et al. quite logically proposed a tethered substrate model for Csx3 in which the RNA was bound in the RNA binding site on one face of the Csx3 dimer, and then extended into the active site on the opposite side of the subunit for cleavage. However, this model cannot explain the degradation of cA4, which is far too small. The CRISPR associated HEPN RNases such as Csx1 and Csm6 are two domain proteins that utilize an N-terminal CARF domain to bind cyclic oligoadenylates that then activates the Cterminal HEPN RNase domains. In some cases, these CARF domains have catalytic residues installed in the cA4 binding site that slowly degrade the cA4 signal in a metal independent mechanism (18)(19)(20). In this sense, the activities of these ancillary Cas proteins are "self-limiting", much like the classic picture for slow GTP hydrolysis by Gproteins.
We might consider a similar model for Csx3. The crystal structure suggests Asp 69 in the cA4 binding site could potentially act as a general base to deprotonate water or the ribose 2'-OH, catalyzing nucleophilic attack on the neighboring phosphorus to cleave cA4, resulting in a product with 3' phosphate groups. However, this mechanism fails to explain the effect of Mn 2+ on the reaction, which instead points to the histidine and metal rich RNase active site on the other face of the molecule (6). One alternative solution is to invoke a metal independent RNase active site, though this is inconsistent with the data of Yan et al. (6) and our own ring nuclease data. Another possibility is a mechanism that utilizes both sites. Whether this might involve activation of the metal dependent active site (6) by binding of cA4 at the allosteric regulatory binding site, or some other mechanism is not clear. The structure of the RNase active site seems largely unchanged by the presence of the 4-base RNA fragment, but this fragment was a degradation product, which would not be expected to activate the CARF domain. In any event, we currently lack a satisfying explanation for this activity and additional work is clearly needed.
Utility of unlabeled cA4 ring nuclease and fluorescence lifetime binding assays -Previous work on CARF domain proteins has employed radiolabeled substrates and ligands to measure both ring nuclease activity and binding affinity for cA4. Our work, in contrast, has utilized unlabeled cA4 for both the ring nuclease and fluorescence lifetime binding assays. The "cold" ring nuclease assay reported here is easily adapted for the study of ring nuclease activity in general, and thus of general use to the field. And while the fluorescence lifetime binding assay is specific to proteins containing a tryptophan residue in the ligand binding site, Trp 50 is strictly conserved among the Csx3 orthologs identified in this study and is thus applicable to Csx3 in general. Notably, the assay benefits from the stoichiometry of the interaction. With a single cA4 molecule bound across the dimer interface, the change in the fluorescence waveform arises from two copies of Trp 50 for each cA4 bound. This label free binding assay could be utilized for the development of CARF domain agonists and antagonists, which may have potential therapeutic applications in pathogenic bacteria carrying type III CRISPR-Cas systems or other CARF domain proteins.
Physiological Specificity for cA4 -In a significant advance, Csx3 was previously described as a manganese dependent deadenylation ribonuclease, with specificity for poly-A tails in single stranded RNAs (6). While a rate constant for this activity was not reported, we note the assay conditions utilized a 2:1 ratio of Csx3 (20 μM) to substrate RNAs (10 μM), and that under these conditions, significant uncleaved substrate is still seen at the 5-minute time point. In contrast, we have instead used cA4 in excess, with a Csx3:cA4 stoichiometry of 1:15 (10 μM Csx3/150 μM cA4), and under these conditions, cA4 is largely degraded within 5 minutes. This simple comparison suggests Csx3 shows at least 20 times greater specific activity for cA4 than it does for small poly-A tailed RNAs. Further, the dissociation constant for the poly-A tailed RNA reported here indicates Csx3 shows ~50 fold greater affinity for cA4 than this small poly-A RNA. Considering both the 50-fold lower Kd and the 20-fold greater catalytic activity, the overall catalytic efficiency of Csx3 is at least 1,000-fold greater for cA4 than it is for the poly-A tailed RNA substrate. These are strong arguments that cA4 is indeed the physiologically relevant substrate, and that Csx3 functions within the cell as a cyclic oligonucleotide phosphodiesterase, or ring nuclease.
Additional evidence for a physiological role for Csx3 as a ring nuclease comes from consideration of the activities seen for other CRISPR associated ring nucleases (10). Specifically, Csx3 ring nuclease activity is an order of magnitude greater than that seen for Crn1, and is approaching the level of activity seen for virally encoded anti-CRISPR (Acr) ring nucleases (21). If anything, the activity is perhaps higher than expected (23), and its activity may need to be regulated at some level if present in the cell at significant concentrations. In this regard, the ability of the dynamic C-terminal tail to mediate interactions within a larger complex may be extremely relevant.
Overall then, Csx3 possesses the ability to recognize cA4 at levels as low as a single molecule per cell and is also well endowed with ring nuclease activity. Once an infection is cured, and Cas10 ceases production of cA4, these attributes would allow it to quickly and effectively scrub the cell of cA4. Csx3 thus has all the attributes needed to serve as an effective counterbalance to cA4 production by Cas10, allowing it to regulate the duration and amplitude of the cA4 signal. In cases where the infection is cured, the cell can then return to normal metabolic activity, providing an off-ramp from the programmed cell death pathway. Importantly, ring nucleases have yet to be identified in many organisms harboring type III systems. The assignment of this activity to Csx3 thus represents a significant advance in our understanding of CRISPR-Cas in cells utilizing type III CRISPR-Cas systems.

Experimental procedures
Gene Annotations -Protein accessions were matched to their respective genome accessions. In the case of "non-redundant" Csx3 sequences, a representative genome with the highest level of completion was analyzed. In cases where a contig or scaffold were the only available information, analysis was restricted to the assembled read. Genome accessions were downloaded from the GenBank database with annotations from NCBI. In each case, csx3 was already annotated. Genomes were then opened and viewed by eye using the Unipro UGENE software (24). For genomes in which the neighboring genes were already annotated, those annotations were used. In the event that neighboring genes lacked annotation, the amino-acid annotation was analyzed with HHpred (25). Identification was considered conclusive only when the E-value was <0.001 and probability was greater than 95%.
Csx3 expression and purification -The native coding sequence for AfCsx3 with a minimal N-terminal His6-tag (Met-His6) was purchased from Thermo Fisher Scientific and moved into Gateway plasmid pDEST14 using previously described protocols (26) to give the pDEST14-AfCsx3 expression vector. For protein expression, E. coli BL21 (DE3)-pRIL (Stratagene) was transformed with pDEST14-AfCsx3. Typically a single colony was used to inoculate a 5 ml overnight LB starter culture (100 μg/ml ampicillin and 34 μg/ml chloramphenicol), which was then used to inoculate ZYP-5052 autoinduction media (27) (1:1000) and grown at 20°C for 48 hours. Cells were harvested by centrifugation at 4,000 x g for 30 min and 5-7 g pellets were stored at -20 o C until needed. Cell pellets were then thawed and resuspended in 5 mL lysis buffer (20 mM Tris-Cl pH 7.5; 500 mM NaCl; 5% glycerol; 1 mM PMSF) per gram of cell pellet and lysed by French Press. The lysate was centrifuged at 25,000 x g for 30 minutes and the supernatant applied to a Ni-NTA affinity column, which was then washed with lysis buffer plus 20 mM imidazole and eluted with lysis buffer plus 500 mM imidazole. Fractions containing Csx3 were then combined and applied to a Superdex S-75 column equilibrated with 10 mM Tris-Cl pH 7.5, 500 mM NaCl, 5% glycerol. Protein concentrations were determined by Bradford assay (28) using Protein Assay Reagent (Bio-Rad) and BSA as a standard. The purity and molecular weight of Csx3 were confirmed by SDS-PAGE.
Standard yields were approximately 25 mg of pure AfCsx3 from a 5 g cell pellet. Fractions containing pure Csx3 were combined and aliquots at 1 mg/ml were stored at -20 °C.
Fluorescence lifetime binding assays -cA4 was titrated (0 -750 nM) across a 96well quartz microplate (Hellma) containing 150 nM Csx3 dimer in 20 mM Tris-Cl pH 7.5, 100 mM NaCl, and 1.0% glycerol (final concentrations) with total well volumes of 350 µl, with three technical replicates per plate for each experiment. Binding assays were also performed with Poly-A Tail RNA (5'-GGGAAAAGAAAAAA-3') as described above using 300 nM Csx3 dimer and 0 -20 µM RNA. Wells were excited at λ = 295 nm and time-resolved emission of individual wells followed at λ = 350 nm using a NovaFluor fluorescence lifetime spectrometer [Fluorescence Innovations, (29,30)]. Waveforms were recorded over a total decay time of 128 ns at 2 ps intervals. The fluorescence lifetime data were analyzed as described in supporting information of Schlick et al. (31) and more recently, and perhaps more succinctly, in supplemental of Bernhard et al. (32). Briefly, the waveforms collected from each well were fit to a linear combination of the free (Wf) and complexed (Wc) waveforms, yielding the fraction of protein in complex with ligand at each specific cA4 concentration. The data were then fit to the "one-site specific binding" function in GraphPad Prism (Y = Bmax*X/(Kd + X).
Drops were assembled with 1 µl of protein, 1 µl of well solution and 0.1 µl of 10 mM cA4 (c-tetraAMP, BioLog Lifescience Institute). Crystals grew to a maximum of 150 x 150 x 100 µm over the course of 3-4 weeks. The crystals were looped and flash frozen in liquid N2. X-ray diffraction data were collected at APS-Argonne National Laboratory NE-CAT 24-ID-C. Data were indexed, integrated and scaled in space group I222 using the HKL2000 software package (33). The Phaser-MR module (34) of Phenix (35) was used to determine initial phases for Csx3 using the existing structure of apo-Csx3 (PDB ID 3WZG) determined by Yan et al. (6). The structure was completed by iterative model building with Coot (36) and refinement with phenix.refine (37). The final Rwork and Rfree were 18.7% and 21.6%, respectively. MolProbity was used for model validation (38), with 98% of residues falling in the most favored regions of the Ramachandran plot and none in disallowed regions. Additional statistics on data and model quality, and model refinement are presented in Supplemental  (39).
Cyclic oligonucleotide phosphodiesterase assays -"Ring nuclease" assays were adapted from those of Athukoralage et al., but instead utilized unlabeled cA4 (10). Briefly, cA4 (c-tetraAMP, BioLog Lifescience Institute, final concentration 150 µM) was added to reactions containing 10 µM AfCsx3 in 10 mM Tris-Cl pH 7.5, 200 μM MnCl2, 500 mM NaCl and 5% glycerol (final concentrations) plus SUPERase•In inhibitor and incubated at 50 o C. 10 µl aliquots were removed from the reaction mixture and quenched with equal volume phenolchloroform at the desired time points. Upon quenching, 20 µl H2O was added to each aliquot to bring the sample volume up to 30 µl aqueous and 10 µl organic. 20 µl of the aqueous portion were then removed and placed in a SpeedVac for 30 minutes or until entirely evaporated. Samples were then re-suspended in 20 µl methanol, centrifuged in a microcentrifuge at 14,000 RPM for 10 minutes to concentrate any precipitate, and the top 15 µl was removed and again placed in a SpeedVac for 30 minutes or until evaporated, re-suspended in 5 µl methanol, and spotted onto a glass-backed F254 silica gel TLC plate. The TLC plate was developed in 70% ethanol and 0.2 M ammonium bicarbonate (pH 9.3) until the mobile phase reached approximately 75% of the plate height. Plates were dried and imaged with an F254 lamp and imaging box. The intensities of the residual cA4 bands were quantitated using ImageJ software and normalized to time = 0. Using GraphPad Prism, the data were then fit to progress curves as described by Sternberg et al. (40) to determine the first order rate constants (kobs).

Data availability
The Csx3 model and associated X-ray diffraction data (structure factors) have been deposited in the Protein Data Bank (6VJG). All additional data are contained within the manuscript and supplemental information. and F c are the observed and calculated structure factor amplitudes used in refinement. R free is calculated as R work , but using the "test" set of structure factor amplitudes that were withheld from refinement (9.97%). c Correlation coefficient (CC) is agreement between the model and 2mF o -DF c electron density map. A) The fluorescence lifetime of AfCsx3 Trp 50 was used to follow cA4 binding. The mean waveform at each cA4 concentration was modeled as a linear combination of the cA4 bound and free waveforms, yielding the fraction bound. Each point is the mean of three independent experiments, each with three technical replicates. Error bars represent the standard deviation of the mean. The data were then fit to the one site single binding function in GraphPad Prism giving Kd = 55 ± 4.5 nM. Importantly, the indicated cA4 concentrations along the abscissa represent total ligand concentration. Free ligand concentration is significantly less at the lower concentrations of cA4 used in this assay. Thus, the apparent Kd represents only an upper limit, and may well be an underestimate of the affinity. B) Note that the ligand concentrations in panel B are μM rather than nM as in panel A. The fluorescence lifetime binding assay shows significantly lower affinity for the small linear RNA with a poly-A tail, giving a Kd of 1.9 ± 0.2 μM.

Figure 2. The AfCsx3
Fold. Previous structures of Csx3 reported a 6-stranded mixed β-sheet with 2 α-helices. The high-resolution crystal structure described here revealed an ordered C-terminus with a third alpha helix (α3, dark blue) followed by an additional β-strand (β7, violet). This Cterminal extension crosses the outside, solvent exposed surface of the β-sheet, allowing β7 to add antiparallel to β1. The fold is thus a mixed seven-stranded β-sheet with β7↑-β1↓-β2↑-α1-β3↑-α2-β4↑-β5↓-β6↑-α3 topology with the α-helices providing right-handed crossovers between the indicated β-strands. In a dimer, α1 and α2 are buried at the subunit interface while α3 is exposed on the surface. Conserved motifs GRXPXW, LXHXXH and DPR are found in the β3-α2, α2-β4, and β4-β5 loops. Note the LXHXXH RNase active site motif that lies at the "bottom" of the protomer, while the GRXPXW and DPR motifs involved in cA4 recognition are at the top. A) The Csx3 dimer packs in sheets or planes within the crystal. B) The inter-dimer crystal contacts within a plane are anchored primarily by a parallel βstrand hydrogen bond interaction between β7 (violet) of one subunit and β6 (light blue) of the neighboring dimer. The ordered β7-strand of the C-terminal tail thus serves to cross link dimers within the crystal as it "inserts" between β1 (green) of one subunit and β6 of the neighbor. This interaction leads to the formation of continuous extended β-sheets that running the length of the crystal. Visual observations and PISA interface analysis software indicate these interactions are not substantial enough to support the formation of a stable protein interface in solution. This open symmetry is thus a likely artifact of crystallization conditions and is not indicative of interactions in vivo. That said, the dynamic C-terminal tail clearly has the ability to form interprotein interactions which might mediate protein-protein interactions within the cell.  Superposition of the β3-α2-β4-β5 sequence of Csx3 (A) onto the β4-α4-β5-β6 sequence of S. solfataricus Csa3 CARF domain (B) (RMSD 2.9 Å) reveals core secondary structures shared by the two proteins (green). Csa3 and Csx3 each contain this contiguous βαββ super secondary structures in which the first two βstrands are connected by a right-handed helical crossover followed by a reverse turn into the 3 rd β-strand. The core helix in this motif is an essential part of the dimer interface, it packs against the two beta-strands connected by the reverse turn, β4β5 in Csx3 (left) and β5β6 in Csa3 (right). In the classic dinucleotide binding domain, these beta-strands are instead connected by a righthanded helical crossover, in which the connecting helix occupies the same space as the helix from the adjacent Csa3 or Csx3 subunit. The reverse turn connecting these β-strands is thus a critical structural feature of the CARF domain dimer interface. In addition, Csx3 places β2 and α1 in positions equivalent to the discontinuous β1 and α3 secondary structural elements of Csa3 (grey). Figure 6. RNA density in the 3WZI Csx3 structure accommodates cA 4 . A) Stereo image. An omit map was generated using the 3WZI model with RNA atoms deleted and contoured at 1.5 σ (green mesh). The omit map was then used to inform modeling the cA4/Csx3 interaction. Csx3, oriented as in panel 5A, is represented as a surface model with nitrogen atoms in blue, oxygen atoms in red, carbon atoms colored grey, and cA4 is represented in sticks with carbon atoms colored yellow. B) Csx3 secondary structure elements shown in cartoon representation with select side chains shown as sticks. cA4 is also shown in stick representation with carbons in beige.