The small RbcS-like domains of the β-carboxysome structural protein CcmM bind RubisCO at a site distinct from that binding the RbcS subunit

Carboxysomes are compartments in bacterial cells that promote efficient carbon fixation by sequestering RubisCO and carbonic anhydrase within a protein shell that impedes CO2 escape. The key to assembling this protein complex is CcmM, a multidomain protein whose C-terminal region is required for RubisCO recruitment. This CcmM region is built as a series of copies (generally 3–5) of a small domain, CcmMS, joined by unstructured linkers. CcmMS domains have weak, but significant, sequence identity to RubisCO's small subunit, RbcS, suggesting that CcmM binds RubisCO by displacing RbcS. We report here the 1.35-Å structure of the first Thermosynechococcus elongatus CcmMS domain, revealing that it adopts a compact, well-defined structure that resembles that of RbcS. CcmMS, however, lacked key RbcS RubisCO-binding determinants, most notably an extended N-terminal loop. Nevertheless, individual CcmMS domains are able to bind RubisCO in vitro with 1.16 μm affinity. Two or four linked CcmMS domains did not exhibit dramatic increases in this affinity, implying that short, disordered linkers may frustrate successive CcmMS domains attempting to simultaneously bind a single RubisCO oligomer. Size-exclusion chromatography–coupled right-angled light scattering (SEC-RALS) and native MS experiments indicated that multiple CcmMS domains can bind a single RubisCO holoenzyme and, moreover, that RbcS is not released from these complexes. CcmMS bound equally tightly to a RubisCO variant in which the α/β domain of RbcS was deleted, suggesting that CcmMS binds RubisCO independently of its RbcS subunit. We propose that, instead, the electropositive CcmMS may bind to an extended electronegative pocket between RbcL dimers.

shell facets (9 -12). A more distantly related protein of the same family, CcmP, forms a double ring of fused protomers and may allow passage of larger metabolites (13,14). A second, unrelated small protein, CcmL (Pfam3319), forms pentameric species that plug the vertices of the shell (15)(16)(17)(18). Together, these form a continuous protein barrier perforated only by small pores that appear optimized to preferentially allow passage of anionic species (19).
Ensuring that all required proteins are targeted to the carboxysome in appropriate quantities requires an extensive network of protein-protein interactions. In ␤-carboxysomes, CcmM seems to be the central actor in organizing the interior proteins. This large, modular protein has an N-terminal domain that is homologous to ␥-class carbonic anhydrases. In many cyanobacteria, the ␥-carbonic anhydrase-like domain is a functional carbonic anhydrase; in these species, the carbonic anhydrase activity requires formation of a disulfide bond, and the enzyme is only activated once it is encapsulated behind a mature shell and the environment becomes oxidizing (20,21). A subset of cyanobacteria also have a separate ␤-class carbonic anhydrase, CcaA, that either supplements or supplants this role; CcaA is hexameric and is recruited to the carboxysome through direct interactions with CcmM (22)(23)(24). The N-terminal domain of CcmM also recruits CcmN, another absolutely required protein that is believed to help recruit the protein shell (25). In addition to the N-terminal domain, CcmM's C terminus contains three to five copies (depending on the strain) of an ϳ90-amino-acid domain, abbreviated here as CcmM S . Cyanobacterial (and plant) RubisCO forms a hexadecameric heterooligomeric complex with eight copies of a 55-kDa large, catalytic subunit, RbcL, and eight copies of a 14-kDa small subunit, RbcS (1); RbcS's role seems to lie primarily in stabilizing the oligomer during assembly and fine-tuning RubisCO's activity and specificity (26). CcmM S domains have appreciable sequence similarity to RbcS (27) and similarly bind RubisCO (28). Joining the CcmM S domains are short segments of poorly conserved, highly hydrophilic sequence that likely act as linkers or spacers. CcmM is also unusual in that it is made in two distinct isoforms: a full-length protein, encompassing all domains (generally referred to as CcmM58), and a shorter version, translated from a conserved internal ribosome entry site, that includes only the CcmM S domains and linkers (CcmM35) (29). Both isoforms are required to form a carboxysome.
Carboxysome assembly appears to be initiated by interactions of CcmM and RubisCO, which together form a dense matrix; this is then partitioned off behind a growing shell (30,31). Both CcmM isoforms are recruited from the beginning of carboxysome formation and remain evenly distributed through the lumen of the mature carboxysome (32). If CcmM is absent, RubisCO remains diffusely distributed through the cytoplasm, and carboxysome formation is never initiated (30). In many cyanobacteria, a homolog of RubisCO activase (Rca) with a single CcmM S -like C-terminal domain is found; these domains are proposed to similarly drive interactions with RubisCO, ensuring that the activase is appropriately encapsulated within the carboxysome.
After the growth and maturation of a procarboxysome consisting primarily of RubisCO and CcmM, the final stages of maturation involve recruitment of a shell, which is proposed to be able to cleave through oversized RubisCO-CcmM procarboxysomes (31). Shell recruitment also correlates with changes to the interior of the carboxysome, notably reorganization of RubisCO into a paracrystalline array (30,33) and a switch to an oxidizing environment (31) required to activate the carbonic anhydrase functionality of CcmM (20).
Although the importance of CcmM to the recruitment of RubisCO to the carboxysome is clear, the details of how this occurs remain unresolved. One complication is that RbcL recruits RbcS (which CcmM possibly displaces) through a complex process that generally entails at least one of two distinct dedicated chaperones, RbcX and Raf1, stabilizing the RbcL 2 dimers (which themselves require GroEL/ES to fold) and then RbcL 8 octamers in a soluble state that permits RbcS recruitment (34). In addition, analysis of protein abundance in cyanobacteria has shown RbcL and RbcS to be present in an approximate 8:5 ratio, leading to the suggestion that the remaining RbcS domains are displaced by CcmM (35). Because CcmM S domains are RbcS homologs, a reasonable hypothesis is then that they share RbcS's requirement to bind RbcL in a chaperone-mediated assembly process (7,28,29,36). Here, we show that CcmM S is structurally similar to RbcS but lacks key motifs associated with RbcL binding. We also show that these domains can bind the mature RubisCO with micromolar affinity but do not cause release of RbcS from the complex and, indeed, bind a RubisCO variant with most of RbcS deleted. Together, these findings indicate that CcmM binds RubisCO at a site distinct from the RbcS site.

CcmM RbcS-like subdomain forms a soluble, well-behaved protein
We cloned the first (N-terminal-most) RbcS-like domain spanning residues 226 -320 of Thermosynechococcus elongatus BP-1 CcmM (henceforth CcmM S1 ) into an expression vector as both a His-tagged and tag-free variant. The resulting 94-aminoacid protein fragment was found to be expressed well, could be purified and concentrated up to 120 mg/ml in a simple buffer, and remained soluble indefinitely; this suggests that CcmM S forms a compact, well-folded structural subdomain that is stable independently of other CcmM motifs or any interaction with RubisCO. We also expressed and purified both WT RubisCO and an RbcL/RbcX/RbcS⌬18 construct (RbcS with a stop codon at position 18, which allows only the first 17 amino acids of RbcS to be translated; henceforth RbcLS⌬18). Both of these RubisCO variants were also successfully heterologously expressed and purified. For the RbcLS⌬18 complex, yield and solubility were similar to those of WT RubisCO, and both variants showed similar thermal denaturation profiles except that the RbcLS⌬18 profile was missing a denaturation peak around 75°C.

The structure of CcmM S1 resembles RbcS
We determined the structure of CcmM S1 from a tagless construct that crystallized only in the presence of cobalt thiocyanate (which mediates important crystal contacts). We exploited the cobalt ions to perform cobalt single-wavelength anomalous

CcmM binds RubisCO independently of RbcS
diffraction phasing (at 1.6 Å); this initial structure was then re-refined against a high-resolution (1.35-Å) native data set (structure statistics are shown in Table 1). The crystals proved to be of space group P2 1 2 1 2 1 with two CcmM S1 molecules in the asymmetric unit; these chains are very similar in structure, superimposing with an r.m.s.d. of 0.33 Å. The ordered portion of the domain comprises 88 residues with the five residues C-terminal to 313 being disordered in both protomers; we therefore conclude that these residues are part of the linker between domains rather than part of the domain itself.
As anticipated from sequence similarity, the structure of CcmM S1 resembles RbcS and is organized as a four-stranded, antiparallel ␤-sheet (with topology 2, Ϫ1, 3, Ϫ4) with two ␣-helices packing on one surface (Fig. 1, A and B). The ␤1-␤2 loop forms an additional, short ␣-helix (␣1a) that sits at the end of the ␤-sheet. This organization places CcmM S1 's N and C termini immediately adjacent to one another. Mapping sequence conservation onto the CcmM S1 structure reveals an extensive, very conserved face to the protein, spanning the exposed underside of the ␤-strand through the ␣1a helix, with a pronouncedly basic, largely polar nature ( Fig. 1, C-F). At the center of this face are Glu 249 , Trp 261 , and Arg 292 (residues also conserved in RbcS as Glu 41 , Trp 53 , and Arg 86 ) supplemented by Arg 303 and Glu 306 . Tyr 290 stacks under Tyr 256 and anchors ␣1a. The basic residues Arg 254 , Arg 255 , Arg 257 (all ␣1a), and Arg 301 along with Gln 289 complete this extended conserved surface.
Searching the Protein Data Bank (PDB) with this structure using Dali (37) revealed that the closest structural homologs are, as predicted from sequence similarity, RbcS domains. Surprisingly, the closest matches are from red-like RubisCO (rather than the green-like RubisCO characteristic of ␤-cyanobacteria and higher plants), including red algae proper (e.g. Alcaligenes eutrophus PDB code 1bxn; Z-score, 11) and dia-toms (Thalassiosira antarctica PDB code 5mz2; Z-score, 11). The RbcS domain from T. elongatus (PDB code 3zxw; Z-score, 10.6) is the closest green-type RubisCO match. The structures of all RbcS domains from hexadecameric RubisCOs of very different organisms (including higher plants) resemble one another in structure and sequence far more closely than any RbcS resembles CcmM S1 , and the slightly better statistics for the algal RbcS proteins likely reflects small, coincidental local similarities rather than a closer evolutionary relationship. We focus our comparisons on the cognate T. elongatus RubisCO (PDB code 2ybv, which is more complete than PDB code 3zxw) as this is the key structure for understanding the interplay between RbcS and CcmM in RubisCO binding.
CcmM S1 and RbcS structures superimpose fairly closely with an r.m.s.d. of 2.1 Å over 83 residues. RbcS, however, is noticeably larger with 103 versus 88 amino acids ordered (Fig. 1B). The majority of additional residues form a long N-terminal loop (residues 3-17) separate from the ␣/␤ domain "body" of the protein, whereas other insertions extend the ␣1 helix and form a two-residue bulge that breaks strand ␤2. Note that the short helix ␣1a in CcmM is replaced by an equally long 10-residue loop in RbcS. The surface of RbcS is less conserved than that of the very conserved RbcL and is strongly conserved primarily where residues contact RbcL (Fig. 1E). RbcS is also noticeably less basic than CcmM S1 ( Fig.  1F; predicted net charges of ϩ1.4 at neutral pH for CcmM S1 and Ϫ1 for the equivalent domain of RbcS). Despite the clear homology and similarity in folds, the underlying sequences are deeply divergent (17% identity between T. elongatus RbcS and CcmM S1 ; Fig. 1G), resulting in the surfaces of the two proteins having few identical residues and different overall properties ( Fig. 1, C-F). Notably, many conserved RbcS residues mediating RbcL contacts are either unconserved or conserved with a different pattern in CcmM S (Fig. 1G).

CcmM S1 lacks key motifs RbcS uses to interact with RbcL
The potential functional implications for the similarities and differences between RbcS and CcmM S1 are best understood in the context of the interactions RbcS makes with the rest of RubisCO. RbcS primarily interacts with two adjacent RbcL protomers where most contacts are to different regions of the first two helices and the last helix (plus a helical bundle extension) of the RbcL ␤-barrel; additional minor interactions are also formed with a short helix that extends from an RbcL chain in the opposite ring of the hexadecamer and adjacent RbcS domains ( Fig. 2A). RbcS's N-terminal loop (residues 3-17) forms extensive interactions with one RbcL protomer (Fig. 2, A and B), with the surface buried by this region comprising approximately a third (845 Å 2 of the 2600 Å 2 ) of the buried surface between RbcS and the rest of RubisCO; these interactions are estimated to contribute at least two-thirds of the estimated interaction energy (Ϫ3.9 of Ϫ5.6 kcal/mol; calculations using PISA (38)). The absence of this motif in CcmM S would be anticipated to significantly reduce CcmM S 's ability to bind RubisCO. CcmM S does, however, seem to have hydrophobic pockets similar to those in RbcS that interact extensively with the aromatic rings of Tyr 10 (Tyr 290 , Ile 308 , Arg 311 , and Tyr 256 ) and Phe 13 (Thr 230 , Glu 307 , and Phe 305 ) from this loop, suggesting that it could possibly bind this motif in trans (Fig. 2B, inset). Table 1 Crystallographic data and refinement statistics for single anomalous dispersion and native data collection of CcmM S1 CcmM S1 (peak) CcmM S1 (native)

CcmM binds RubisCO independently of RbcS
A second critical RbcS interaction motif is the highly conserved W 56 KLP⌽F (where ⌽ is a hydrophobic residue) encompassing ␤2. This motif makes several contacts with RbcL, most notably an extended hydrophobic patch comprising Met 55 , Leu 58 , Pro 59 , Phe 61 , and Phe 90 , which together pack primarily on Trp 70 , Leu 73 , and Leu 74 contributed by an RbcL protomer from the distal ring (Fig. 2C). The corresponding residues in CcmM S1 form a more regular ␤-strand (albeit with a broken hydrogen bond at Pro 265 )  RubisCO. The dashed inset shows the interactions CcmM S1 (blue) could potentially make with RbcS's N-terminal tail in this position. C, the interactions mediated by RbcS's conserved motif. The inset shows the detail of CcmM S1 in the equivalent region; this protein lacks the characteristic ␤-bulge, and the corresponding residues are smaller, more polar, and weakly conserved.

CcmM binds RubisCO independently of RbcS
with no bulge and smaller, more polar and less conserved residues; this region seems unlikely to interact favorably with the hydrophobic patch on RbcL (Fig. 2C, inset).
In summary, CcmM S1 shows clear overall similarity to RbcS, but the two proteins have very few conserved surface residues in common and have different surface properties. Although many of the polar interactions made by RbcS could plausibly be recapitulated by roughly similar residues of a CcmM S1 molecule placed analogously in a RubisCO complex, the absence of an N-terminal loop and the WKLP⌽F hydrophobic bulge motif in CcmM suggests that CcmM is unlikely to outcompete RbcS for access to its binding site by forming more favorable interactions with RbcL.

CcmM S subunit variants bind RubisCO with micromolar affinity
The association and dissociation behavior of RubisCO binding to CcmM was measured using localized surface plasmon resonance (LSPR). CcmM S1 was immobilized on a gold nanoparticle nitrilotriacetic acid chip, and an excess of RubisCO was then flowed over the chip as the analyte to determine binding kinetics. Global analysis of the sensorgram traces using a 1:1 binding model yielded an association rate constant, k on , of 8.37 ϫ 10 2 Ϯ (4.83 ϫ 10 1 ) M Ϫ1 s Ϫ1 and a dissociation rate, k off , of 9.74 ϫ 10 Ϫ4 Ϯ (7.34 ϫ 10 Ϫ7 ) s Ϫ1 (Fig. 3A and Table 2; note that RubisCO concentrations throughout the paper are reported in terms of RbcL-RbcS heterodimer concentrations; these concentrations and subsequent affinities should be divided by 8 to reflect the concentration of the hexadecamer). The calculated dissociation constant, K D , of this interaction is therefore 1.16 Ϯ (6.83 ϫ 10 Ϫ2 ) M. The calculated half-life of the CcmM-RubisCO complex (the inverse of the k off ) in these experiments is ϳ17 min.
We also tested binding of RubisCO to constructs with multiple CcmM S domains joined by linkers. CcmM S1-2 (which contains the first two CcmM S domains of CcmM and the linker Figure 3. A, LSPR traces for representative runs of RubisCO binding to immobilized CcmM S1 , CcmM S1-2 , or CcmM S1-4 as well as RbcLS⌬18 binding to immobilized CcmM S1 . The dashed vertical line marks the transition between the association and dissociation phases. Note that although all runs show comparable binding kinetics, a significantly more robust signal is observed when binding CcmM S1-2 and CcmM S1-4 . B, SEC-RALS traces of RubisCO, CcmM S1 , and their complex.

CcmM binds RubisCO independently of RbcS
joining them) bound with a K D an order of magnitude lower than that of CcmM S1 alone (0.171 M). Notably, this increase in affinity is primarily driven by more efficient binding (k on of 4.2 ϫ 10 3 M Ϫ1 s Ϫ1 for CcmM S1-2 versus 8.37 ϫ 10 2 M Ϫ1 s Ϫ1 for CcmM S1 ) with similar k off values. CcmM S1-4 (which contains all four CcmM S domains with linkers) bound with an affinity intermediate between CcmM S1 and CcmM S1-2 (0.644 M). The modest gains in affinity shown by constructs that include more CcmM S domains suggest that after one domain is bound, additional linked domains do not bind additively to RubisCO.
Finally, we analyzed CcmM S1 binding to RbcL-RbcS⌬18; the reasoning behind this experiment was that if CcmM S competes with RbcS for a single binding site, then deleting the competing domain of RbcS should markedly increase CcmM S affinity. Alternatively, if RbcS forms part of CcmM S 's binding site, then deleting this motif should result in an appreciable decrease in binding affinity. The RbcL-RbcS⌬18 protein bound CcmM S1 with a K D of 1.62 M, a value very similar to that observed with WT RubisCO. This suggests that CcmM binds in a manner that neither depends upon nor competes with RbcS's ␣/␤ domain.

The CcmM S1 -RubisCO complex does not release RbcS
We examined the nature and composition of the CcmM-RubisCO complex by both size-exclusion chromatographycoupled right-angled light scattering (SEC-RALS) and native mass spectrometry (MS). SEC-RALS experiments were run on an integrated system (OMNISEC), which simultaneously reports UV/visible absorbance, index of refraction, viscometry, and RALS measurements over an SEC elution profile. Running the two components independently, RubisCO (comprising RbcL 8 S 8 ) was determined to have a 537-kDa molecular mass, very close to the calculated molecular mass of 536 kDa (Fig. 3B). CcmM S1 had a measured molecular mass of 10.5 kDa, somewhat lower than the 12.787 kDa calculated mass. When run as a 1 mg/ml ϩ 1 mg/ml mixture of the two proteins, a new major peak complex appeared with an apparent molecular mass of 678.5 kDa. This peak is homogenous across its elution profile, consistent with a single species being present. This mass is considerably higher than would be expected if CcmM S1 were replacing RbcS in the complex (the calculated molecular mass of the RbcL 8 -(CcmM S1 ) 8 complex is 525.4 kDa) but is within 6% of the value expected for a complex where one CcmM S1 binds per RbcLS heterodimer (RbcL 8 S 8 -(CcmM S1 ) 8 calculated molecular mass is 638 kDa).
The composition of protein complexes between RubisCO and CcmM S1 constructs was also examined by nanoelectrospray ionization (nano-ESI) MS. Nano-ESI analysis of purified CcmM S1 yielded a molecular mass of 12.787 kDa (calculated molecular mass, 12.787 kDa), whereas the molecular masses of acid-dissociated RbcS and RbcL dimers agree well with values derived from the sequence. The native RubisCO spectrum showed a molecular mass of 544.74 kDa, 8.66 kDa larger than calculated from the measured component masses, suggesting that the complex binds multiple solute ions during the ESI process. A larger than calculated mass using nano-ESI under similar conditions was also previously observed for tobacco RubisCO (38). Spectra were then collected for samples where CcmM S1 at different concentrations (2, 4, and 6 M) was preincubated with RubisCO at a fixed concentration of 0.1 M. These showed the appearance of a series of new peaks in the spectrum (Fig. 4) with molecular masses calculated as 557.61, 570.44, 583.50, 596.25, and 609.09 kDa. These peak molecular masses differ from one another (and the native RubisCO) by the successive addition of 12.869 Ϯ 0.116 kDa, allowing us to identify these species as RbcL 8 S 8 complexes with between one and five CcmM S1 masses added ( Fig. 4 and Table 3). The proportion of higher molecular mass complexes increased at higher CcmM S1 concentrations; at the highest CcmM S1 concentrations, the appearance of additional high m/z peaks makes spectra too complex to reliably interpret (Fig. 4D). Of note, all species had masses consistent with eight RbcL and eight RbcS subunits plus a varying number of CcmM S1 subunits, and no peak consistent with free RbcS was detected in any native experiment. These data therefore strongly argue that CcmM binds RubisCO without RbcS being released from the complex.

Discussion
CcmM's interaction with RubisCO drives the early stages of ␤-carboxysome assembly, allowing the cross-linking of free RubisCO into a large, amorphous body known as the procarboxysome (30 -32). The details of the stoichiometry, affinity, dynamics, and structure of the CcmM-RubisCO complex are therefore central to understanding carboxysome biogenesis. CcmM's small subdomains are clearly homologous to RbcS, and this, along with cyanobacterial proteomics data that indicate that RbcS is depleted relative to RbcL (35), previously suggested a model where CcmM binds by replacing RbcS in some of its sites. Because cyanobacterial RubisCO assembly is complex and RbcS binding depends upon interventions by several chaperones (34), if CcmM is functioning primarily as an RbcS mimic, one might expect it to show a similar dependence upon the in vivo RubisCO assembly process. We found, in contrast, that the interaction between CcmM S domains and mature RubisCO occurs readily in solution, independently of any additional cellular factors, with CcmM S1 binding RubisCO with an ϳ1.16 M K D . Binding RubisCO with a multi-CcmM S domain CcmM S1-2 or CcmM S1-4 construct results in only a small increase in affinity, suggesting that binding energy is not additive and that only one domain is able to bind RubisCO at a time. This is consistent with in vivo experiments showing that a

CcmM binds RubisCO independently of RbcS
CcmM variant with only two CcmM S subunits cross-links RubisCO into a procarboxysome (30) and our observations that mixing this construct with RubisCO in solution results in rapid precipitation. These LSPR experiments also suggest a 17-min half-life for the CcmM S1 -RubisCO complex. A long half-life is similarly suggested by previously published fluorescence recovery after photobleaching experiments that show that individual RubisCO molecules within a growing procarboxysome are not free to move on a minute time scale (31). Conversely, Chen et al.
propose (31) that shell formation may require the advancing shell edge to cleave through the mass of CcmM-cross-linked RubisCO during the last 2 h of carboxysome assembly; CcmM release from RubisCO may therefore be a necessary event in the later stages of carboxysome maturation with the observed halflife at least commensurable with the time scale of these events. We investigated the stoichiometry and content of the complex using SEC-RALS and native MS experiments. Contrary to models where CcmM displaces RbcS from the RubisCO complex (35), these data show that binding of CcmM occurs with no release of RbcS. These SEC-RALS and native MS data also show that RubisCO can bind multiple CcmM S1 subunits. The maximal stoichiometry is difficult to ascertain (the MS data become too complex to interpret as complex sizes increase, and the uncertainty in the SEC-RALS data is larger than the mass of a single CcmM S1 protomer), but stoichiometry exceeds 5:8 CcmM:RbcL, and the SEC-RALS data suggest that the maximal binding stoichiometry is ϳ1:1. Characterization of whole-cell lysates of Synechococcus elongatus PCC7942 suggested that the total CcmM58 ϩ CcmM35 concentration is roughly equal to the RbcL concentration (35). Given that these isoforms each contain three CcmM S domains, more than enough CcmM S domains would appear to be available in vivo to saturate available RbcL subunits.
We also determined the structure of CcmM S1 , showing the predicted overall resemblance to RbcS; however, detailed comparison with RbcS shows that CcmM conserves very few of the motifs RbcS uses to interact with RbcL. In particular, the N-terminal loop of RbcS (residues 3-17 in T. elongatus) is responsible for about one-third of the overall surface buried between RbcS and RbcL and is estimated to account for about a half of the interaction energy. Furthermore, single point mutations R8G and E11V (in Anabaena RbcS) prevent holoenzyme assembly of recombinant RubisCO in Escherichia coli, arguing that this N-terminal loop is likely indispensable for stabilizing RubisCO (26,39). This motif is completely absent in CcmM S where equivalent residues show negligible conservation and form part of the linker, although CcmM S does retain a pair of hydrophobic pockets that could interact with a bound N-terminal loop. CcmM S also lacks a very conserved RbcS motif around ␤2, which protrudes as a ␤-bulge to bury several bulky hydrophobic residues in an RbcL hydrophobic pocket. Overall, inspection of CcmM suggests that it does not closely mimic RbcS's binding surface, and it is not obvious how it might outcompete RbcS for its binding site.
One possible model to explain these findings is that CcmM S binds by displacing RbcS's structural body (residues 18 -end) but not the N-terminal loop (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17) for which CcmM has no equivalent residues; this RbcS region might then help contribute to the CcmM S -binding site while tethering the rest of RbcS to the complex (Fig. 5A). Although CcmM S might not make especially strong interactions with RbcL, this could be offset by . Ion peaks with m/z ratios consistent with additional mass from binding of one to five CcmM S1 domain(s) are indicated as RbcL 8 S 8 -(CcmM S1 ) 1 (green), RbcL 8 S 8 -(CcmM S1 ) 2 (violet), RbcL 8 S 8 -(CcmM S1 ) 3 (blue), RbcL 8 S 8 -(CcmM S1 ) 4 (brown), and RbcL 8 S 8 -(CcmM S1 ) 5 (red). a Difference between the experimental molecular mass measured for a given complex and that of the next lightest complex in the series. b Masses of CcmM S1 and RbcL were calculated assuming that the N-terminal methionine is removed by methionine aminopeptidase during expression. , suggesting that the RbcS ␣/␤ domain neither competes with nor contributes significantly to the CcmM S1 -binding site. Together, these data strongly indicate that CcmM, despite its clear homology to RbcS, utilizes a binding site on RubisCO that is distinct from that occupied by RbcS. If CcmM does not bind in the RbcS site, where does it bind? Although we have been unable to obtain a crystal of this complex, there are some constraints: in particular, CcmM likely does not significantly contact RbcS and is unlikely to impinge on the symmetry axes of the complex (or else binding sites would overlap, forcing substoichiometric binding). One strong candidate site is the highly electronegative groove that runs between RbcL dimers and widens out into a large pocket away from RubisCO's equator. Given that the conserved surface of CcmM is markedly electropositive, binding in this area could be stabilized by multiple salt bridges; one possible configuration is depicted in Fig. 5B. Of note, the equivalent pocket in higher plant RubisCO is far less electronegative (e.g. PDB code 3axm).

CcmM binds RubisCO independently of RbcS
The linker between successive CcmM S domains averages 32 Ϯ 6 residues long across all CcmM homologs (using the first and last structured CcmM S1 residues as a benchmark). Two successive CcmM S domains would tend to average about 34 Å apart (assuming a perfectly flexible linker and a random walk configuration), reaching 100 Å if the linker is stretched taut. Because the holoenzyme forms a roughly 11-nm rounded cube, most candidate CcmM S -binding sites on RubisCO would be further apart from their symmetry copies than the linker would prefer to span. Our LSPR data indicate that the presence of additional binding domains in a CcmM molecule offers little advantage in binding RubisCO, with K D values of 1.16, 0.171, and 0.644 M for a single domain, two linked domains, and four linked domains, respectively. Although the longest variant would almost certainly allow two domains to bind a single RubisCO at once, the energetic cost in terms of loss of conformational entropy (possibly combined with residual structure of self-interaction by the linker) seems large enough to prevent linked CcmM S domains from binding favorably to the same RubisCO molecule. These biophysical properties likely serve to ensure that CcmM S domains prefer to bind multiple RubisCO molecules, cross-linking them into an extended aggregate, the procarboxysome (Fig. 5C).
Although the structure and possible interaction mode help clarify CcmM's role in assembling RubisCO into a procarboxysome, there are hints that these CcmM S domains may have additional roles to play in carboxysome maturation. A growing ␤-cyanobacterial cell generally contains three to six carboxysomes with one more under construction (31). This requires that ϳ10 -20% of cellular RubisCO is located in the cytosol where CO 2 levels are low and O 2 levels are high, conditions that promote photorespiration. This would seem to confer a strong selective pressure to maintain RubisCO in assembling procarboxysomes in an inactive state, using either the changes in conditions as the protein shell segregates RubisCO from the cytosol or the changes in protein interactions (such as release of an inhibitory CcmM) as an activation trigger. This model has a clear precedent in the behavior of CcmM's N-terminal domain where the transition from reducing to oxidizing conditions upon shell formation of the carboxysome activates its carbonic

CcmM binds RubisCO independently of RbcS
anhydrase activity (20,21,31). The requirement for cooperative activation might explain the otherwise puzzling observation that the RubisCO-CcmM complex reorganizes from an amorphous state in immature procarboxysomes into paracrystalline layers in mature carboxysomes (33,40). Given that CcmM tightly binds RubisCO during procarboxysome assembly and is present in near stoichiometric quantities, and the tightness of packing implied by the paracrystalline spacing (which, at 11 nm, is similar to the diameter of RubisCO), it is likely that CcmM either helps mediate the interactions that stabilize the more ordered paracrystalline state or releases RubisCO to allow this reorganization.

Molecular biology
DNA sequences were amplified from T. elongatus BP-1 genomic DNA (a generous gift from the Kazusa Research Institute, Japan) using pairs of primers as listed in Table 1 and iProof DNA polymerase (Bio-Rad). Successful amplicons were digested using the listed restriction endonucleases (New England Biolabs). pET28a vector (used for all constructs) was digested using the appropriate restriction endonuclease pair, treated with alkaline phosphatase, and purified using agarose gel electrophoresis. Insert and vector were then ligated using T4 DNA ligase and transformed into chemically competent DH5␣ cells. Site-directed mutagenesis was performed using PCR using Pfu-X7 DNA polymerase (a gift from Dr. D. Christendat, University of Toronto) with overlapping primers (Table  4; Thermo Fisher). All plasmids were verified by sequencing at the Advanced Analysis Center, University of Guelph. Chemically competent BL21(DE3) cells were transformed with verified plasmids for overexpression. For RubisCO construct variants, cells were also transformed with pACYC-GroEL/ES-TF plasmid (Addgene) that expresses GroEL/ES (as well as trigger factor) needed for RubisCO maturation.

Protein expression and purification
Transformed BL21(DE3) cells were grown overnight in 5 ml of LB at 37°C and used to inoculate 1 liter of 2ϫ yeast tryptone medium. Once the optical density at 600 nm reached 0.8, cultures were induced with 1 mM isopropyl ␤-D-1-thiogalactopyranoside and allowed to further incubate overnight. Cells were pelleted by centrifugation at 4,400 ϫ g and resuspended in 35 ml of lysis buffer (20 mM Tris, pH 8, 150 mM NaCl). CcmM variant constructs could then be frozen at Ϫ20°C for storage, whereas RubisCO preparations required working with fresh cells. Cell pellets were lysed using a Misonix XL-2020 sonicator, and cellular debris was pelleted by centrifugation at 48,384 ϫ g for 30 min.
For N-terminally hexahistidine-tagged CcmM variants, the supernatant was then loaded onto a 1-ml HisTrap Fast Flow column (GE Healthcare) using an ÄKTA FPLC at 4°C and then eluted using a gradient of 0 -250 mM imidazole. Eluted protein was buffer-exchanged and concentrated using a 10-kDa molecular mass-cutoff Amicon filter unit, and the purity was assessed using SDS-PAGE. These CcmM variants were used for all experiments except crystallization. For the tagless CcmM S1 variant used for crystallization, cell growth and purification proceeded as above except that pellets were resuspended in 20 mM MES, pH 5.5. Instead of metal-affinity chromatography, cell-free lysate was loaded onto a 5-ml HiTrap SP Sepharose Fast Flow column (GE Healthcare) in 20 mM MES, pH 5.5; washed; and then eluted across a 0 -0.5 M NaCl gradient over 20 column volumes.
For RubisCO purifications, growth was scaled up to 4 liters; cell pellets were never frozen; and all cell handling, lysis protein purification, and protein storage steps were performed at room temperature or above (as the protein precipitates at 4°C). Cell pellets were suspended in a buffer comprising 50 mM Tris-HCl, pH 7.6, 1 mM EDTA, 20 mM ␤-mercaptoethanol, 10 mM MgSO 4 , 10 mM NaHCO 3 prior to lysis in a lukewarm waterbath. After centrifugation, the supernatant was slowly brought up to 20% (w/v) ammonium sulfate and stirred at room temperature for 1 h. Protein was centrifuged at 48,384 ϫ g and 25°C for 30 min, and ammonium sulfate was added to 30% (w/v) and stirred overnight at room temperature. After centrifugation as above, the supernatant was resuspended in lysis buffer and dialyzed against the same buffer overnight. After centrifugation, the supernatant was loaded onto a 5-ml HiTrap Q Sepharose Fast Flow column (GE Healthcare) in the same buffer and then eluted using a linear gradient from 0 to 0.5 M NaCl over 40 column volumes. Both RubisCO variants eluted at ϳ0.25 M NaCl.

SEC-RALS
SEC-RALS experiments were run on a Malvern OMNISEC instrument using PBS solution as the running buffer. The system was calibrated against a BSA standard immediately before the measurement runs. Protein samples for individual proteins were loaded at 5 mg/ml. For runs with RubisCO plus CcmM S , each protein component was added at 1 mg/ml. Samples were separated over two tandem P3000 columns with a flow rate of 1 ml/min.

Mass spectrometry
Protein samples were buffer-exchanged into 100 mM aqueous ammonium acetate, pH 6.8, using either a 3-kDa-cutoff Amicon 0.5-ml microconcentrator (EMD, Millipore, Billerica, MA) for CcmM S1 or a Bio-Rad 10DG desalting column for RubisCO. To determine accurate molecular mass values of the RbcS and RbcL subunits, RubisCO aqueous solution was acidified to pH 3 by addition of acetic acid. Nano-ESI MS measurements were performed on a Synapt G2S quadrupole-ion mobility separation-TOF mass spectrometer (Waters) equipped with a nanoflow ESI source. Nano-ESI was performed by applying a voltage of ϳ1 kV to a platinum wire inserted into the nano-ESI tip, which was produced from a borosilicate glass capillary (1.0-mm outer diameter, 0.68-mm inner diameter) pulled to ϳ5-m outer diameter using a P-1000 micropipette puller (Sutter Instruments, Novato, CA). The source temperature was 60°C, and gas flow rate was 2 ml/min. The cone, trap, and transfer voltages were 30, 5, and 2 V, respectively. MassLynx software (version 4.1) was used for data acquisition and processing. Mass spectra were averaged over ϳ500 scans.

Localized surface plasmon resonance
Binding of CcmM variants to RubisCO variants was analyzed using an OpenSPR LSPR biosensor (Nicoya Life Sciences Inc., Kitchener, Ontario) according to the manufacturer's instructions. All experiments utilized 20 mM Tris, pH 8.0, with 150 mM NaCl as the running buffer, and a new chip was used for each ligand. In all experiments, the CcmM construct variant serving as the ligand was immobilized to the nitrilotriacetic acid sensor chip through one injection of 100 g/ml protein at 20 l/min for 5 min. Once a stable baseline was achieved, analyte (RbcLS or RbcL/S⌬18) at 20 l/min was injected for 4 min. Dissociation was then monitored for 26 min, and the signal was allowed to return to stable baseline before the next injection. Analyte injections were performed at 500, 750, and 1000 g/ml protein, each in triplicate. Sensorgram traces of the CcmM-RubisCO interaction were recorded and analyzed using TraceDrawer software (Ridgeview Instruments).

Protein crystallization and structure determination
Crystals were grown from a solution of 1.8 M ammonium sulfate, 0.2 M sodium thiocyanate, 10 mM CoCl 2 , 0.1 M MES, pH 6.5, against a 30 mg/ml protein solution in a sitting drop conformation. Spontaneously nucleating crystals typically grew as large, thin, purple plates that formed stacked columns. Diffraction quality crystals were obtained by microseeding using a seed stock prepared by vortexing initial crystals and then diluting the fragments 1,000-fold in crystallization well solution. Crystals were transferred into Paratone N oil to remove excess solution prior to flash freezing in liquid nitrogen. Data were collected at the Canadian Light Source, beamline 08ID-1. After the anomalous scattering peak was established using X-ray fluorescence, a data set was collected at a wavelength of 1.60522 Å, the cobalt anomalous scattering peak; the resolution of this data set was limited by detector geometry rather than crystal quality. A high-resolution native data set was also collected at a wavelength of 0.97949 Å. Data were processed in XDS and scaled in XSCALE (41). The peak data set was used to determine the structure by single anomalous diffraction using Autosolve in Phenix (42), resulting in a figure of merit of 0.58 for the experimental maps. This was followed by iterative manual rebuilding in Coot (43) and refinements in Phenix. Following refinement, molecular replacement of the native crystal data set was conducted using the single-wavelength anomalous diffractionrefined structure as a model followed by multiple refinement cycles in Coot. Both data sets show strong translational pseudosymmetry with a peak height of 28.4% of the origin at fractional cell coordinates 0.213, 0.500, 0.00. All structure figures were prepared in PyMOL v2.0.