Crystal Structure of the Largest Subunit of a Bacterial RNA-guided Immune Complex and Its Role in DNA Target Binding*

Background: Cascade is a prokaryotic immune complex that targets foreign DNA. Results: We present the crystal structure of CasA, the largest subunit of Cascade, and analyze its contribution to DNA target binding. Conclusion: CasA is necessary for the DNA target binding by Cascade. Significance: Learning how each subunit of Cascade functions is crucial for understanding how this immune complex works. Prokaryotes make use of small RNAs encoded by CRISPR (clustered regularly interspaced short palindromic repeat) loci to provide immunity against bacteriophage or plasmid invasion. In Escherichia coli, the CRISPR-associated complex for antiviral defense (Cascade) utilizes these RNAs to target foreign DNA for destruction. CasA, the largest subunit of Cascade, is essential for its function. Here we report the crystal structure of Thermus thermophilus CasA. The structure is composed of two domains that are arranged in a chair-like conformation with a novel fold forming the larger N-terminal domain. Docking of the crystal structure into cryo-electron microscopy maps reveals two loops in CasA that likely have important functions in DNA target binding. Finally, DNA binding experiments show that CasA is essential for binding of Cascade to DNA target.

In response to attack by invasive genetic elements, such as bacteriophages or plasmids, prokaryotes integrate short fragments of foreign DNA into their chromosome at sites of clustered regularly interspaced short palindromic repeats (CRISPRs) 2 (1)(2)(3). RNA transcribed from these CRISPR arrays is cleaved within each repeat sequence to yield a library of small CRISPR-derived RNAs (crRNAs), each of which contains complementary sequence to the invading element (4 -10). After packaging into CRISPR-associated (Cas) protein complexes, mature crRNAs act as guides that detect and mediate destruction of foreign nucleic acid (2, 10 -16). For a recent review of CRISPR/Cas biology, see Ref. 17.
The CRISPR-associated complex for antiviral defense (Cascade) mediates DNA target recognition by the CRISPR/Cas system in Escherichia coli (10). Cascade is a 405-kDa ribonucleoprotein complex composed of an unequal stoichiometry of five, functionally essential protein subunits (CasA 1 , CasB 2 , CasC 6 , CasD 1 , and CasE 1 ) and a 61-nucleotide crRNA (13). Cascade binding to DNA target not only requires base-pairing between the crRNA and its complementary sequence in the DNA target (the protospacer) but also depends on a protospacer adjacent motif (PAM) (18). In E. coli targets, the 3-nucleotide PAM sequence (5Ј-CAT-3Ј) is found upstream of the protospacer (18). CRISPR arrays lack a PAM sequence, ensuring that the host genome is not targeted and thus serving as a mechanism for self versus nonself recognition (18 -22). Once bound to target, Cascade is believed to recruit the Cas3 nuclease that then destroys the DNA target (10,(23)(24)(25).
Two cryo-electron microscopy (cryo-EM) reconstructions of E. coli Cascade, one bound to a single-stranded target that spans the protospacer but not the PAM sequence (the protospacer target) and the other unbound, have provided detailed insight into the organization of its subunits (15). Overall, Cascade has a sea horse-shaped architecture (see Fig. 1A) with a backbone formed by a helical-filament of six copies of CasC wrapped around the extended crRNA. The head is capped by CasE at the 3Ј-end of the crRNA, and CasA and CasD cap the tail at the 5Ј-end. Two copies of CasB sit on the inner surface of the CasC-crRNA spine, directly connecting the head and the tail of the complex. The reconstruction of Cascade bound to protospacer target reveals a concerted conformational change involving CasA, CasB, and CasE that could be a signal to recruit Cas3 (15).
Despite recent advances in the understanding of our organization and biochemistry of Cascade (6,8,10,13,15,18), the functions of many of its subunits are still poorly understood. To better understand the role of CasA, we have determined the crystal structure of CasA, docked this structure into cryo-EM reconstructions (15), and compared the DNA binding properties Cascade with those of a subcomplex of Cascade lacking the CasA subunit (CasBCDE).

EXPERIMENTAL PROCEDURES
Cloning and Protein Expression-The cloning and expression strategy was similar to that described previously (10,13). Thus, all genes were amplified from genomic DNA (American Type Culture Collection) and directionally cloned into a series of expression vectors (Table 1). An E. coli CRISPR array consist-ing of seven identical spacers (sequence: 5Ј-CCAGTGATA-AGTGGAATGCCATGTGGGCTGTC-3Ј) was synthesized by GeneArt. All proteins were overexpressed in the T7Express strain of E. coli (New England Biolabs). Cells were grown in LB medium, supplemented with the appropriate antibiotic(s) ( Table 1), at 37°C to an A 600 of 0.3-0.5, and subsequently protein expression was induced with 0.2 mM isopropyl-␤-D-thiogalactopyranoside overnight at 20°C.
Purification of E. coli Proteins-E. coli CasA, Cascade, and the CasBCDE-crRNA subcomplex were all purified using the same protocol. Harvested cells were lysed in buffer L (20 mM Tris-HCl, pH 8.0, 100 mM NaCl and 10% glycerol), clarified, and then loaded onto a 5-ml immobilized metal affinity chromatography column (Bio-Rad). The column was then washed with 10 mM imidazole before the protein of interest was eluted with 250 mM imidazole. N-terminal tags were removed by treatment with tobacco etch virus (TEV) protease overnight at 4°C. Samples were then desalted to remove imidazole and then reapplied to immobilized metal affinity chromatography resin to remove the His-tagged TEV protease, any cleaved tag, or any remaining tagged protein. Samples were then concentrated and loaded on a HiLoad 26/60 S200 size-exclusion column (GE Healthcare) pre-equilibrated with Buffer A (20 mM Tris-HCl, pH 8.0, 200 mM NaCl, and 1 mM tris(2-carboxyethyl)phosphine). As seen previously, all proteins eluted as symmetrical peaks at their expected molecular weights (13).
Purification of Thermus thermophilus CasA-Harvested cells were lysed in buffer L, and the clarified lysate was heat-treated at 70°C for 10 min. Following centrifugation, the sample was adjusted to 1.5 M ammonium sulfate and loaded onto a 5-ml Fast Flow Phe column (GE Healthcare) pre-equilibrated with 40 mM Tris-HCl, pH 7.5, 1.5 M ammonium sulfate, 10% glycerol. Protein was eluted with a linear gradient of 1.5-0 M ammonium sulfate. The relevant fractions were pooled, and the protein was further purified over a 5-ml Fast Flow Q column (GE Healthcare) before finally being loaded on a HiLoad 26/60 S200 column (GE Healthcare) pre-equilibrated with Buffer A. The final purified protein was concentrated to ϳ30 mg/ml using Ultracel 10K centrifugal filter unit (Millipore). Structure Determination-X-ray diffraction data were collected at either beamline 9.2 at the Stanford Synchrotron Radiation Light Source (SSRL) or beamline X25 at the National Synchrotron Light Source (NSLS). Data were processed with HKL2000 (26). SHELX (27) was used to find the positions of the platinum sites. Phases were calculated using SOLVE (28) and improved by solvent flattening and noncrystallographic symmetry averaging in RESOLVE (28). Iterative model building and refinement were carried out in COOT (29) and PHENIX (30).

Cryo-electron Microscopy Map Fitting and Preparation of Figures-Rigid-body docking of the T. thermophilus
CasA crystal structure into the cryo-electron microscopy density of E. coli Cascade was performed with Chimera (31). All structure panels were generated using PyMOL (32) or Chimera (31).
DNA Binding Experiments-Binding assays contained 20 mM Tris-HCl, pH 8.0, 100 mM NaCl, and 10% glycerol. All oligonucleotides were gel-purified. dsDNA was made by annealing oligonucleotide A (5Ј-TCAATCTACAAAATTGAGCAAA-TCAGACAGCCCACATGGCATTCCACTTATCACTGGC-ATTGCTTTCGAGCTTGCCGATCAGCTT-3Ј) with oligonucleotide B (5Ј-AAGCTGATCGGCAAGCTCGAAA-GCAATGCCAGTGATAAGTGGAATGCCATGTGGGC-TGTCTGATTTGCTCAATTTTGTAGATTGA-3Ј). Trace amounts (5-200 pM) of 5Ј-end 32 P-labled dsDNA were incubated with an increasing concentration of Cascade or CasBCDE for 1 h at 37°C, prior to electrophoresis through a 5% polyacrylamide gel. In experiments with saturating CasA, 250 nM was confirmed to be saturating as repeating these experiments with 1 M CasA (data not shown) gave the same results. DNA was visualized by phosphorimaging and quantified using Image Gauge (Fuji). As described before (18), fraction of DNA bound was plotted versus protein concentration and fit to a one-site binding isotherm, using the GraphPad Prism software. Reported K d values are the average of three replicates.

RESULTS
Crystal Structure of CasA-To gain a more detailed understanding of CasA (also known as CRISPR-subtype E. coli 1 (Cse1) or YgcL), we determined its crystal structure. Initial attempts to crystallize the E. coli protein were unsuccessful. We therefore expressed and purified the homolog from T. thermophilus HB8 (TthCasA). We chose this organism because the sequence of TthCasA has ϳ50% similarity with E. coli CasA, and crystal structures of both TthCasB (33) and TthCasE (34) have been determined. Crystals of TthCasA were obtained by vapor diffusion using a precipitant solution containing sodium acetate. The crystals belonged to the space group P2 1 (a ϭ 93.9 Å, b ϭ 47.9 Å, c ϭ 129.2 Å, and ␣ ϭ ␥ ϭ 90°, ␤ ϭ 97.52°) and contain two monomers in the asymmetric unit. The structure was determined by single isomorphous replacement, utilizing platinum-soaked crystals, and the structure was refined to 2.4 Å resolution with an R work of 19.4% and an R free of 25.3%. Additional data collection, phasing, and refinement statistics are given in Table 2. The final model displayed good geometry and contained all of the TthCasA sequence with the exception of the first 4 N-terminal and last 6 C-terminal residues, as well as two internal loops formed by residues 129 -142 (N-loop) and residues 405-409 (C-loop). The structures of the two monomers of TthCasA in the asymmetric unit are virtually identical with a root mean square deviation of 0.1 Å over 473 C␣ atoms.
Overall the structure of TthCasA can be divided into two domains corresponding to the N-and C-terminal parts of the polypeptide chain (Fig. 1B). The two domains are arranged in a chair-like conformation, with the N-domain forming the seat and the C-domain forming the backrest (Fig. 1B). The larger N-domain includes residues 1-364 and is composed of 11 ␤-strands and 9 ␣-helices. A search of the structural database using the DALI server found no significant matches, suggesting that the N-domain has a novel fold. The smaller C-domain includes residues 365-502 and is formed by five ␣-helices. Four of these ␣-helices form an up-down-up-down four-helix bundle. The fifth, smaller ␣-helix is located in a flexible loop (the C-loop) separating the first and second helix of the bundle.
Docking the Crystal Structure of TthCasA into the Cryo-EM maps of Cascade-To gain further insight into the role of CasA, we rigid-body fit the crystal structure of TthCasA into the cryo-EM maps of Cascade with and without bound protospacer target (15). The crystal structure aligned well into both maps, and ␣-helices in the crystal structure aligned with the corresponding rods of density in the cryo-EM maps (Fig. 1, C and D). The quality of the fit into both cryo-EM maps suggests that there is no significant change in the relative orientation of the two domains of CasA, observed in the crystal structure, upon binding to Cascade.
In the cryo-EM map of Cascade with no bound target, the CasA N-domain sits adjacent to CasD, and contiguous cryo-EM density suggests that the N-loop of CasA contacts with the 5Ј-end of crRNA (Fig. 1C). The C-domain of CasA contacts the fifth and sixth CasC subunits as well as the neighboring CasB subunit (Fig. 1C). Upon binding to protospacer target, the CasA, CasB, and CasE subunits undergo a concerted conformational change (15). In the cryo-EM map of Cascade bound to protospacer target, the movement of CasA is such that the N-loop appears to no longer interact with the crRNA, whereas the C-loop now makes new contacts with the crRNAprotospacer duplex (Fig. 1D).
DNA Binding by Cascade-Binding of Cascade to nonself target relies on the recognition of a PAM (18). Previous studies on the role of CasA in this binding were performed before the E. coli PAM was identified (13). We therefore examined the role of CasA in Cascade binding to an 85-bp dsDNA target containing protospacer and functional PAM sequences. A fixed concentration of dsDNA target was incubated with increasing concentrations of either Cascade or CasBCDE, and complex formation with dsDNA was analyzed by native gel electrophoresis ( Fig. 2A). CasBCDE did not bind dsDNA target, whereas Cascade did. The amount of complex formed between dsDNA target and Cascade exhibited a sigmoidal dependence on Cascade concentration (Fig. 2B). There are at least two possible explanations for the sigmoidal binding curve, either (i) cooperative binding between multiple sites on Cascade or the dsDNA or (ii) the existence of two equilibriums, one between Cascade  and dsDNA and the other between a single subunit of Cascade and the rest of the complex. Because the stoichiometry between Cascade and dsDNA target is thought to be 1:1 (13,15) and CasA was seen to dissociate from Cascade during competitive ssDNA binding experiments (13), the sigmoidal dependence is more likely the result of dissociation of CasA from Cascade at low concentrations. To confirm this hypothesis, we repeated the above binding experiments in the presence of saturating concentrations of CasA (250 nM). Under these conditions, the amount of complex formed between Cascade and dsDNA target exhibited a hyperbolic dependence on Cascade concentration ( Fig. 2B) with an apparent dissociation equilibrium constant (K d ) of 0.54 Ϯ 0.1 nM. The addition of a saturating concentration of CasA to CasBCDE rescued binding of this complex to dsDNA target ( Fig. 2A) and also displayed a hyperbolic dependence on CasBCDE concentration with a K d indistinguishable from Cascade in the presence of saturating concentration of CasA (Fig. 2B). In control experiments, CasA alone was not able to bind dsDNA target (13) (Fig. 2A).

DISCUSSION
The crystal structure of TthCasA reveals a two-domain protein with a novel N-terminal fold and a C-terminal four-helix bundle. A prominent feature of this structure is two disordered loops, one in the N-domain and another in the C-domain, termed the N-loop and C-loop, respectively. Docking the crystal structure of TthCasA into the cryo-EM maps of Cascade suggests that these loops become ordered when CasA binds Cascade and that they make significant contacts with the crRNA and the protospacer target. In the absence of target, the N-loop makes contact with the 5Ј-end of the crRNA, but upon Cascade binding to protospacer target, the N-loop disengages from the crRNA (Fig. 1D) (15). The C-loop makes little or no contacts with the crRNA in the absence of target but does make extensive contacts with the crRNA-protospacer duplex when Cascade is bound to protospacer target (Fig. 1D). Thus, both of these loops appear to make key contributions to the specific structural states that correlate with Cascade target binding.
The PAM plays a critical role in self versus nonself recognition (18 -22). The PAM is found in nonself DNA targets but not in the host sequence, CRISPR loci. Recent DNA binding experiments have demonstrated that mutations in the PAM sequence decrease the affinity of Cascade for DNA target (18), suggesting a direct interaction between Cascade and the PAM. The N-loop of CasA may mediate this critical interaction. If a longer nonself target, including the PAM sequence, were modeled onto Cascade, the projected path of the target would position the PAM adjacent to the site where the N-loop of the crystal structure of CasA docks into the EM map (Fig. 1D).
Our DNA binding experiments show that CasA is essential for specific binding of Cascade to nonself target (Fig. 2). Taken together with the observation that CasA dissociates from the complex at low concentrations, this suggests that CasA expression levels may provide an opportunity for regulation of the activity of Cascade within the cell. Cascade would not be able to bind dsDNA target at low expression levels of CasA, but at high expression levels, Cascade could bind DNA target and signal its destruction by Cas3. Confirmation of this model will require measurement of the cellular concentrations of the individual Cascade subunits.
In summary, we have shown here that the CasA subunit of Cascade is essential for nonself target binding. We present the crystal structure of CasA and its fit into cryo-EM maps of Cascade bound and unbound to protospacer target. This structural analysis reveals two loops in CasA that are likely key sensors for dsDNA target binding.
While this manuscript was in preparation, a similar analysis of CasA was published by Doudna and colleagues (35). This manuscript independently presents similar results but also experimentally confirms the role of the N-loop in both PAM binding and additionally in the control of nonspecific DNA binding by Cascade.