Single Binding Mode Integration of Hemicellulose-degrading Enzymes via Adaptor Scaffoldins in Ruminococcus flavefaciens Cellulosome*

The assembly of one of Nature's most elaborate multienzyme complexes, the cellulosome, results from the binding of enzyme-borne dockerins to reiterated cohesin domains located in a non-catalytic primary scaffoldin. Generally, dockerins present two similar cohesin-binding interfaces that support a dual binding mode. The dynamic integration of enzymes in cellulosomes, afforded by the dual binding mode, is believed to incorporate additional flexibility in highly populated multienzyme complexes. Ruminococcus flavefaciens, the primary degrader of plant structural carbohydrates in the rumen of mammals, uses a portfolio of more than 220 different dockerins to assemble the most intricate cellulosome known to date. A sequence-based analysis organized R. flavefaciens dockerins into six groups. Strikingly, a subset of R. flavefaciens cellulosomal enzymes, comprising dockerins of groups 3 and 6, were shown to be indirectly incorporated into primary scaffoldins via an adaptor scaffoldin termed ScaC. Here, we report the crystal structure of a group 3 R. flavefaciens dockerin, Doc3, in complex with ScaC cohesin. Doc3 is unusual as it presents a large cohesin-interacting surface that lacks the structural symmetry required to support a dual binding mode. In addition, dockerins of groups 3 and 6, which bind exclusively to ScaC cohesin, display a conserved mechanism of protein recognition that is similar to Doc3. Groups 3 and 6 dockerins are predominantly appended to hemicellulose-degrading enzymes. Thus, single binding mode dockerins interacting with adaptor scaffoldins exemplify an evolutionary pathway developed by R. flavefaciens to recruit hemicellulases to the sophisticated cellulosomes acting in the gastrointestinal tract of mammals.

Plant cell wall polysaccharides, primarily cellulose and hemicellulose, are the most abundant organic molecules produced in Nature, thus constituting a major reservoir of carbon and energy (1). The intricate organization of structural carbohydrates in plant cell walls and their inherent heterogeneity pose significant constraints to polysaccharide degradation, which usually requires a wide array of catalytic activities acting cooperatively (2,3). In highly competitive anaerobic environments, such as the rumen of mammals, enzymatic systems that recycle the carbon stored in plant cell walls are organized in high molecular mass multienzyme complexes termed cellulosomes (4,5). Molecular integration of microbial biocatalysts into these extremely elaborate nanomachines results from the binding of enzyme-borne dockerin modules (Doc) 3 to reiterated cohesin domains (Coh) located in large non-catalytic scaffoldins, a mechanism that promotes enzyme synergy and stability. In addition, recruitment of cellulosomes to the bacterial cell surface via divergent Coh-Doc interactions allows the immediate uptake of released sugars, which are used by microbes as an energy source.
Ruminococcus flavefaciens is a Gram-positive, anaerobic bacterium of the Firmicutes phylum and the only species in the rumen that has been shown to possess a definitive cellulosome. With over 220 Doc-containing proteins, R. flavefaciens strain FD-1 has potentially the most complex cellulosome described to date (Fig. 1) (6). Based on primary structure identity, R. flavefaciens Docs have been organized into six major groups (7). Recently, classification of Ruminococcus Docs into groups was shown to be functionally relevant as members of the same Doc group present similar Coh specificities (8). Scaffoldin B (ScaB) is the major player in R. flavefaciens cellulosomal organization. Through the binding to five ScaA scaffoldins, it allows the simultaneous integration of up to 14 dockerin-containing proteins in a single cellulosome. These modular proteins possess catalytic modules with different activities, including glycoside hydrolases, carbohydrate esterases, polysaccharide lyases, carbohydrate-binding modules, and also domains with currently unknown function (Fig. 1). The binding of the C-terminal group 4 Doc located in ScaB to the Coh of ScaE, a cell-bound * This work was supported in part by Fundação para a Ciência e a Tecnologia (Lisbon, Portugal) Grant PTDC/BIA-MIC/5947/2014. The authors declare that they have no conflicts of interest with the contents of this article. □ S This article contains supplemental Tables S1-S4 and Figs. S1-S4. The atomic coordinates and structure factors (code 5LXV) have been deposited in the Protein Data Bank (http://wwpdb.org/). 1 Supported by the individual fellowship SFRH/BD/86821/2012. 2 To whom correspondence should be addressed. E-mail: cafontes@ fmv.ulisboa.pt.
anchoring scaffoldin, provides the molecular mechanism to tether R. flavefaciens cellulosome to the bacterial cell surface (9). Unique to the R. flavefaciens FD-1 cellulosome is the presence of the adaptor scaffoldin ScaC, which contains a group 1 Doc and thus can interact with either ScaA or ScaB (10). ScaC also contains a single Coh that is capable of interacting with groups 3 and 6 Docs. The ScaC adaptor scaffoldins may thus modulate integration of alternative types of enzymes into the cellulosome when this is functionally relevant. Structural studies on cohesin-dockerin complexes from Clostridium thermocellum (12)(13), Clostridium cellulolyticum (14), and Acetivibrio cellulolyticus (15) revealed that the observed primary structure duplication in Docs appended to cellulosomal enzymes supports a dual binding mode with their target protein partners. This consists of the dockerin's ability to bind the cohesin in two different orientations, 180°opposite to each other. The dual binding mode is believed to confer additional flexibility to the macromolecular organization of cellulosomes. Primary structure analysis revealed that R. flavefaciens groups 3 and 6 Docs, although appended to enzymes, do not seem to possess the internal sequence symmetry found in other enzyme-associated Docs that is required to support the dual binding mode. Here, we report the structure of the protein complex between ScaC Coh and a group 3 Doc from R. flavefaciens FD-1. A comprehensive biochemical analysis guided by structural information confirmed that groups 3 and 6 Docs present a single cohesin-binding interface. Because dockerins of groups 3 and 6 are appended, essentially, to hemicellulases, the data suggest that R. flavefaciens FD-1 has evolved an original molecular mechanism, using single binding mode dockerins that exclusively interact with adaptor scaffoldins, to recruit this subset of highly important plant cell wall-degrading enzymes to the cellulosome.

Results and Discussion
Expression and Crystallization of a Novel R. flavefaciens Coh-Doc Complex-In a previous study (8), R. flavefaciens groups 3 and 6 Docs were shown to bind specifically to the Coh of adaptor scaffoldin ScaC. Out of the 21 Docs selected for those studies, the Doc of protein WP_009985128 displayed the highest levels of expression. WP_009985128 contains a 730-residuelong N-terminal X141 module of unknown function. X141 was FIGURE 1. Group-specific interactions that contribute to cellulosome assembly in R. flavefaciens strain FD-1. The scheme is color-coded to highlight the four subgroups of cohesin-dockerin specificities as follows: dockerins and cognate cohesin counterparts of the different groups are shown in light blue (group 1 dockerins), yellow (groups 3 and 6), green (groups 2 and 4), and red (group 5), respectively. Group 2 dockerins are truncated derivatives of group 4 and are not represented in the figure for simplification. The red oval marks the complex of the group 3 interaction, whose structure is reported here. expressed individually, and its capacity to degrade a range of substrates was evaluated. The data revealed that X141 is unable to attack structural polysaccharides, including pectins (data not shown). In addition, WP_009985128 also contains an internal family 6 carbohydrate-binding module (CBM6) and a C-terminal group 3 Doc (defined henceforth as Doc3). To gain insights into the molecular mechanisms of cellulosome assembly involving adaptor scaffoldins, the complex combining ScaC Coh (CohScaC) and Doc3 was expressed at high levels, purified, and crystallized. Established strategies for the production and purification of Coh-Doc complexes, which involve the heterologous co-expression of both proteins in Escherichia coli (11), were employed, which allowed generating high quality crystals of the RfCohScaC-Doc3 complex.
Structure of the R. flavefaciens CohScaC-Doc3 Complex-The crystal structure of the RfCohScaC-Doc3 complex was solved by molecular replacement using the structure of C. thermocellum (PDB code 2ccl (12)) type I complex as a search model. The RfCohScaC-Doc3 structure included two molecules of the heterodimer in the asymmetric unit, as well as 118 water molecules, with each Doc coordinating two calcium ions. The dimer resulted from interactions established between two CohScaC modules. Thus, N⑀2 of molecule A CohScaC's Gln-6 interacts with O⑀1 of molecule B CohScaC's Gln-19, whereas N of molecule A CohScaC's Lys-163 hydrogen bonds O␦1 of molecule B CohScaC's Thr-114. The biological relevance of these crystallographic interactions, if any, is presently unclear. The RfCohScaC-Doc3 complex displayed an elongated shape with overall dimensions of 30 ϫ 35 ϫ 60 Å and includes residues 3-174 from CohScaC and residues 889 -953 of Doc3 from R. flavefaciens FD-1 (Fig. 2). Crystal parameters and data collection statistics are summarized in Table 1.

RfCohScaC-Doc3
Values in parentheses are for the highest resolution shell.

Data collection
Beamline IO4-1, Diamond Space group P2 1  Structure of ScaC Coh-R. flavefaciens FD-1 CohScaC in complex with its cognate Doc3 displayed an elliptical structure comprising nine ␤-strands arranged in two ␤-sheets that form a ␤-barrel with the classic "jelly roll" topology ( Fig. 2). The two sheets are formed by ␤-strands 9, 1, 2, 7, and 4 on the noninteracting face and ␤-strands 8, 3, 6, and 5 on the Doc3 contacting face. All ␤-strands are antiparallel except for 1 and 9, which are parallel to each other and complete the jelly roll topology. Unusually, ␤-strand 8 is disrupted by a 17-residuelong ␤-flap that extends from Ala-131 to Ile-147. Furthermore, two ␣-helices are present between ␤-strands 4 and 5 and ␤-strands 6 and 7. Although these motifs are somewhat similar to those observed previously for the type II Cohs from C. thermocellum, Bacteroides cellulosolvens, and A. cellulolyticus (PDB codes 3BM3, 1TYJ, and 1QZN: SSM z-scores of 1.5, 6.3, and 7.2), structural similarity using SSM (16) revealed that the closest functionally relevant structural homologue of CohScaC was the type I Coh from A. cellulolyticus ScaC (PDB code 4UYP for ScaCCoh-ScaBDoc complex) with a z-score of 11.4, a root mean square deviation (r.m.s.d.) of 1.24 Å, over 136 aligned residues out of a possible 146, and a total sequence identity of 30%. However, the ␣-helix connecting ␤-strands 4 and 5 is longer in R. flavefaciens CohScaC, and the Acetivibrio homologue lacks the large insertion identified in ␤-strand 8 of CohScaC (Fig. 3). C. thermocellum CipA Coh is the second closest struc-tural homologue to CohScaC (PDB code 2ccl for CipA Coh complexed with a dockerin) with a z-score of 11, an r.m.s.d. of 1.39 Å over 130 aligned residues out of a possible 149, and a total sequence identity of 27%. CohScaC also shows homology with Cohs of C. cellulolyticum, Clostridium perfringens, B. cellulosolvens, and the Coh from R. flavefaciens ScaE (r.m.s.d. Ͼ1.8 Å; sequence identity Ͻ20%). Secondary structure comparison of CohScaC with representative members of other Cohs with different specificities revealed the distinctive features of the R. flavefaciens ScaC cohesin to be the well defined ␣-helix connecting ␤-strands 4 and 5 and the ␤-flap disrupting ␤-strand 8 ( Fig. 3B and supplemental Fig. S1).
Structure of R. flavefaciens FD-1 Group 3 Dockerin (Doc3)-Within the complex, Doc3 includes two ␣-helices arranged in an antiparallel orientation extending from Val-901 to Asn-913 (helix-1) and Ser-932 to His-943 (helix-3), whereas the loop connecting these structural elements contains a four-residue ␣-helix (helix-2) extending from Phe-919 to Ala-922 (Fig. 2). The overall tertiary structure of Doc3 is very similar to enzyme Docs from C. thermocellum (r.m.s.d. of 0.9 Å) and A. cellulolyticus (r.m.s.d. of 1.4 Å), which display a dual binding mode. Doc3 contains two Ca 2ϩ ions coordinated by several amino acid residues, similar to the canonical EF-hand loop motif. Both of the Ca 2ϩ ions have an n, n ϩ 2, n ϩ 4, n ϩ 11 plus a water molecule pattern of coordination. Thus, the Ca 2ϩ ion located at the N terminus is coordinated by the side chains of Asp-892, Asp-894, Asp-896, and Asp-903 (both the O␦1 and O␦2), the latter belonging to ␣-helix-1. The octahedral geometry of the coordination is completed by the main chain carbonyl of Glu-908 and one water molecule. The second Ca 2ϩ site stabilizes the loop connecting ␣-helices 2 and 3 and is coordinated by the side chains of Asp-923, Asn-925, Asp-927, and Asp-934 (both the O␦1 and O␦2) as well as the carbonyl from Val-929 and a water molecule. A structural overlay of the two duplicated sequences observed in Doc3, indicated they are structurally similar with an r.m.s.d. of 0.8 Å for all main-chain atoms (Fig. 4A).
RfCohScaC-Doc3 Complex Interface-Doc3 interacted with the 8-3-6-5 sheet of the ScaCCoh ␤-sandwich, which presents a predominantly flat surface. However, the C terminus of ␤-strand 8 is elevated in relation to the 8-3-6-5 plane, which enables the N terminus of ␤-strand 9 to interact with Doc3. The ␤-flap on one side of the CohSacC 8-3-6-5 sheet and the ␣-helix, between ␤-strands 4 and 5, on the other side generate the appropriate topology at the surface of the Coh to accommodate Doc3. A large network of polar (Table 2) and hydrophobic interactions (supplemental Table S4) was identified at the complex interface. Their total number is greater than that observed in any related clostridial Coh-Doc complex that involves the recruitment of enzymes into clostridial cellulosomes (defined as type I Coh-Doc pairs (13,17)). In these dual binding mode Docs, the C-terminal region of one of the helices interacts with the Coh, whereas the entire length of the second interacting helix binds to the protein ligand. Doc binding can switch and, as a result of a 180°rotation of the Doc on the Coh surface, the Doc helix with the previous lower number of contacts can dominate Coh recognition, supporting the well described dual binding mode. In contrast, in the RfCohScaC-Doc3 complex the two Doc3 helices (helix 1 and helix 3) make similar contributions to CohScaC recognition (Table 2 and supplemental Table S4). The elevation of the ␣-helix located between ␤-strands 4 and 5 of CohScaC over the plane of the protein-interacting surface allows the entire Coh surface to be in closer proximity to both Doc ␣-helices. This observation together with the lack in sym-FIGURE 4. Significant differences between the two cohesin-binding interfaces do not allow the dual binding mode of type I dockerin from R. flavefaciens. A, overlay of the two dockerin repeats observed in Doc3 showing that the structures are similar (r.m.s.d. of 0.82 Å) in the main chain atoms but have considerable differences in the side chains. B, two interacting helices of Doc3, helix 1 (bright green) and helix 3 (blue), with the most important cohesin recognition residues displayed as sticks. C, comparison of the two putative binding surfaces by overlaying Doc3 with a version of itself rotated by 180°(pink) shows a lack of conservation in the key contacting residues. Lack of internal symmetry in Doc3 and the involvement of the two helices in cohesin recognition suggest that Doc3 displays a single cohesin-binding platform. metry of the binding residues, which is described below, suggests that, in contrast to what was previously observed in several Coh-Doc complexes involving enzyme recruitment, Doc3 presents a single binding mode.
The interactions between ␣-helix-1 of Doc3 and CohScaC are dominated by Val-901, Phe-902, Ile-905, Arg-908, and Lys-909 of Doc3 and Gln-35 and Ser-92 of CohScaC (Fig. 2). The side chains of the Val-901/Phe-902 pair, occupying positions 11 and 12 of Doc3 that were previously suggested to modulate specificity in type I interactions (17), lie in the hydrophobic pocket formed by CohScaC residues Gly-36 and Gly-150. The hydrophobic character of ␣-helix-1 interaction is reinforced by the interaction of Ile-905 with CohScaC Gln-35. The more distal ␣-helix-1 Arg-908/Lys-909 pair contributes to the hydrogen bond network with CohScaC by contacting residues Leu-72, Ser-92, and Glu-96, whereas the aliphatic side chains of these residues make comprehensive hydrophobic contacts with CohScaC Ala-94. In addition, the N1 of Lys-909 contributes two important salt bridges with O⑀2 of Glu-96 and O␦1 of Asp-155 of the CohScaC. In ␣-helix-3, the contacts are dominated by the important salt bridge established between N⑀2 of His-943 and O␦1 of CohScaC Asp-77. His-943 also establishes important hydrogen bonds with Tyr-86 and Ser-88 of the Coh.
In addition, the side chains of Leu-935 and Leu-942 make nonpolar contacts with CohScaC amino acid residues Leu-73 and Ile-90, respectively.
One of the notable features of CohScaC is the presence of an extensive loop disrupting ␤-strand 8. Residues located at this loop make a significant number of contacts with Doc3. Thus, Tyr-941 located in ␣-helix-3 of Doc3 is hydrogen bonded to the carbonyl group of CohScaC loop residue Glu-135, whereas Leu-942 makes a polar contact with CohScaC Arg-134. Furthermore, Val-899 located at the N terminus of Doc3 forms a hydrogen bond with CohScaC Thr-136. Additional van der Waals interactions are established between CohScaC loop residues Val-132 and Val-138 with Doc3 Gly-944 and Leu-949. Strikingly, residues located at the C terminus of CohScaC ␤-strand 8 and the N terminus of ␤-strand 9 make important contributions for Doc3 recognition. The twisted conformation of these two ␤-strands provides a platform that binds Doc3 amino acids located at the N-terminal loop. Hence, the N1 of Lys-159 located in ␤-strand 9 makes three important hydrogen bonds with Asp-896, Glu-898, and Asp-900, which are Doc3 residues participating in the coordination of the calcium ion of the first dockerin repeat. In addition, ␤-strand 9 Ala-157 and the aliphatic chain of Lys-159 provide an important hydrophobic environment to accommodate the Doc3 side chain of Phe-902. Collectively, these observations suggest an extensive interface in RfCohScaC-Doc3 complex not previously observed in type I Coh-Doc interaction.
Doc3 Presents a Single Coh-binding Interface-The binding thermodynamics of Doc3 to CohScaC were assessed by isothermal titration calorimetry (ITC) at 308 K, consistent with the approximate temperature of rumen. The data, presented in Table 3 and exemplified in Fig. 5A, revealed a macromolecular association with a stoichiometry of 1:1 and a K a of ϳ10 7 M Ϫ1 , an affinity similar to other type I interactions. It is noteworthy that the apparent hydrophobic nature of the CohScaC-Doc interaction is associated with an enthalpy-driven interaction, a property previously observed in other Coh-Doc complexes. The importance of Doc3 Phe-902, Arg-908, and His-943 for CohScaC recognition was also probed by ITC. The data ( Table  3 and Fig. 5A) revealed that alanine substitutions of residues Phe-902 and His-943 had no effect in the affinity of Doc3 for its Coh partner. In contrast, the R908A Doc3 derivative displayed a 10-fold lower affinity for the CohScaC (Table 3 and Fig. 5A).
The fact that single amino acid substitutions at the Cohbinding surface of Doc3 had a marginal or no effect on Coh

Thermodynamics of interaction between wild type CohScaC and wild type and mutant variants of Doc3
The last row refers to the interaction between wild type (WT) Doc3 and CohScaC without the flap insertion (NF). All thermodynamic parameters were determined at 308 K. Nb means no binding. recognition may be accounted for by at least two explanations. 1) Doc3 displays a dual binding mode typical of other Docs, and mutation of a single residue has no effect in affinity as it leads to a 180°rotation of the Doc, and the presence of the mutated residue is compensated by its 2-fold symmetry-related counterpart. 2) Doc3 presents a single CohScaC-binding platform so extensive that single substitutions have marginal effects on affinity. To distinguish between these two possibilities, we probed the internal symmetry of Doc3 by overlaying its structure with the 2-fold related derivative using the MatchMaker procedure from Chimera (18), which showed an r.m.s.d. of 0.36 Å for 115 atoms (Fig. 4, A and B). The superposition highlights the lack of conservation in the contacting residues when the two putative Coh-binding surfaces were compared (Fig. 4C). For example, the key Val-901/Phe-902 pair located at positions 11 and 12 of the first repeat is replaced by Ser-932 and Asp-933, whereas His-943, which dominates the hydrogen bond network with the cohesin at the C-terminal ␣-helix, superposes with a Glu-912 (Fig. 4C). The lack of internal symmetry in Doc3 and the involvement of ␣-helices 1 and 3 in Coh recognition confirm that Doc3 displays a single Coh-binding platform. Thus, the importance of Phe-902, Arg-908, and His-943 in binding of CohScaC was investigated by probing the capacity of double and triple mutant derivatives to recognize the CohScaC. The data (see Table 3) suggest that although the RfCohScaC-Doc3 complex presents an extensive protein-protein interface, Doc3 Phe-902, Arg-908, and His-943 dominate Coh recognition, as replacement of these residues by Ala in double and triple mutants significantly diminishes or abrogates binding. Further support for the single binding mode involving several interacting residues is provided by the observation that removal of the CohScaC loop that interrupts ␤-strand 8, which makes several contacts with Doc3, had no influence in affinity (Table 3).  Table 3. B, non-denaturing gel electrophoretic analysis of CohScaC-Doc3 interaction. In the 1st lane, both gels were loaded with the cohesin (Coh). Adjacent lanes were loaded with the dockerin (D3) and with both the cohesin and dockerin modules together after a 60-min incubation at equimolar concentrations. The appearance of a band with a different migration pattern in lanes containing the complex represents a positive result (e.g. D3 WT), whereas a negative result (e.g. D3 FR) is given by the appearance of only the individual dockerin and cohesin bands. A faint cohesin band is seen even in the lanes where there is complex formation that results from excess cohesin probably due to not all the dockerin in solution being active.

R. flavefaciens FD-1 Group 3 and Group 6 Docs Present a Non-dynamic Binding Mode to
CohScaC-Recent data suggest that R. flavefaciens FD-1 groups 3 and 6 Docs display tight specificity for CohScaC (8). The prevalence of xylan (GH10, GH11, and GH43) and pectin (PL11, CE1, CE3, and CE15)-degrading catalytic modules (and associated carbohydrate-binding CBM22 and CBM6 modules) in R. flavefaciens FD-1 cellulosomal proteins containing group 3 Docs suggests that this subset of enzymes is particularly suited to deconstruct hemicellulose and pectin (supplemental Fig. S2). In addition, group 6 dockerins are appended to a broader range of enzymes that include GH5, GH26, GH43, GH44, GH97, PL1, PL11, CE1, CE3, and CE4. The structure of the RfCohScaC-Doc3 complex provides an opportunity to identify the residues that modulate ligand specificity within these two dockerin groups.
All 20 R. flavefaciens group 3 Docs were expressed and purified. From a total of 19 recombinant Docs (one of the dockerins was insoluble when produced in E. coli), 16 were shown to bind to CohScaC using ITC (supplemental Fig. S3) and nondenaturing gel electrophoresis (supplemental Fig. S4). Strikingly, the data (Table 4 and Fig. 5) revealed that the affinities of the group 3 Docs ranged from K a Ͻ10 5 to 10 8 M Ϫ1 . Inspection of the alignment of the R. flavefaciens FD-1 group 3 Docs revealed that the three members, which did not bind CohScaC, lack the three residues that were shown to dominate CohScaC recognition (Fig. 6). Moreover, in two cases (Doc381 and Doc1425) the Docs would appear to recognize CohScaC in the reverse orientation relative to Doc3, because the three residues involved in cohesin recognition (i.e. Phe-902, Arg-908, and His-943) are identified in the opposite helices to those of Doc3. To verify this possibility, Phe and Arg residues observed in ␣-helix-3 of Doc1425 were mutated to Ala, and the affinity of the mutant Doc for CohScaC was determined. The data, presented in Table  4, indicated that the F46A/R52A mutant displays no affinity for CohScaC, suggesting that ␣-helix-3 should occupy the position of Doc3 ␣-helix-1 during Coh recognition. Variation in CohScaC affinities may be explained by replacement of at least FIGURE 6. Alignment of group 3 dockerins. All group 3 dockerins were aligned using Clustal Omega multiple sequence alignment software and are organized according to their affinity to CohScaC, from the highest K a value to the lowest, as determined by ITC. Docs 381 and 1425 were aligned in opposite orientation relative to the other members by switching the N-terminal half with the C-terminal half of the sequence. This resulted in the N-terminal interface (blue residues) of Docs 381 and 1425 being perfectly aligned with the C-terminal interface (green residues) of the remaining group 3 members and vice versa, supporting the theory that they will bind the cohesin in an opposite orientation. The top line matches the protein secondary structure (red cylinders) to the primary structures, as observed in the Doc3 structure, and also points to the calcium coordinating residues (blue triangles). All residues involved in the Doc3 interaction with CohScaC are highlighted according to the color code displayed at the bottom. Conservation of key residues for cohesin recognition along the group is highlighted with black boxes. To some extent, this conservation pattern seems to correlate to the CohScaC affinity profile of the group.

TABLE 4 Thermodynamics of interaction between wild type CohScaC and dockerins from groups 3 and 6
The last row of group 3 refers to the interaction between the F46A/R52A mutant variant of Doc1425 and CohScaC. All thermodynamic parameters were determined at 308 K.  DECEMBER 23, 2016 • VOLUME 291 • NUMBER 52 one of the three residues important for Coh recognition by a non-conserved homologue. For example, Doc3729 and Doc3865, which display the lowest affinities for CohScaC, have His replaced by a Ser and Phe substituted by a Met (Fig. 6). Similarly to what was observed for group 3 Docs, the majority of group 6 Doc-containing enzymes were previously annotated as hemicellulases (7). To understand why groups 3 and 6 Docs have similar Coh specificities, six representative members of R. flavefaciens FD-1 group 6 Docs (out of a total of 45) were produced recombinantly, and their affinities for CohScaC were probed through ITC. The thermodynamic data, displayed in Table 4 as well as the binding thermograms presented in Fig.  7A, revealed that group 6 Docs also display significant differences in affinity for CohScaC. For example, the affinity of Doc903 was beyond the detectable limit of the calorimeter suggesting a K a Ͼ10 9 M Ϫ1 , whereas the other five Docs displayed affinities that were either too low to be quantified (Doc1804) or had a K a value that ranged from 10 5 to 10 7 M Ϫ1 . To rationalize these observations, the group 6 Docs were aligned with Doc3 (Fig. 7B). Residues required for CohScaC recognition are observed in the opposite helices when compared with Doc3, suggesting that in group 6 Docs the ␣-helix-3 interacts with CohScaC similar to the ␣-helix-1 of Doc 3. In addition, insertion of a Ser residue at position 11 that is usually occupied by a hydrophobic amino acid may abrogate binding of Doc1804 to CohScaC.

Structure of R. flavefaciens Cohesin-Dockerin Complex
Conclusions-In nature, Coh-Doc interactions are essential for cellulosome assembly by providing a molecular base for the integration of microbial enzymes onto a primary scaffoldin. Enzyme-containing Docs present a dual binding mode resulting from the presence of two identical Coh-binding faces. Data presented here report a notable exception to this general rule by analyzing the incorporation of cellulosomal enzymes into R. flavefaciens cellulosomes through adaptor scaffoldins such FIGURE 7. Binding affinity of group 6 dockerins to CohScaC determined by ITC. Representative binding isotherms are displayed in A: CohScaC/Doc903, CohScaC/Doc1369, and CohScaC/Doc1965. The upper part of each panel shows the raw heats of binding, and the lower parts include the integrated heats after correction for heat of dilution. The curve represents the best fit to a single-site binding model. The corresponding thermodynamic parameters are shown in Table 4. B, alignment of tested group 6 dockerins with a version of Doc3 in which the C-terminal half was switched with the N-terminal half (Doc3_180), resulting in the N-terminal interface (blue residues) of Doc3 being perfectly aligned with the C-terminal interface (green residues) of the group 6 members and vice versa, supporting the theory that they will bind the cohesin in opposite orientation. Residues involved in Ca 2ϩ binding are pointed out by the blue triangles at the top. All residues involved in the Doc3 interaction with CohScaC are highlighted according to the color code displayed at the bottom. Conservation of key residues for cohesin recognition is highlighted with black boxes.
as ScaC. Previously, R. flavefaciens group 3 and group 6 Docs were shown to specifically recognize the single Coh of scaffolding ScaC. Here, the structure of a group 3 Doc, Doc3, in complex with CohScaC, revealed the presence of a single Coh-binding interface that involves both Doc helices. These observations contrast with the dual binding mode mechanism previously identified in Docs used by the majority of cellulosome-producing bacteria, such as C. thermocellum, A. cellulolyticus, and C. cellulolyticum, to recruit cellulosomal enzymes into primary scaffoldins. Lack of internal symmetry in groups 3 and 6 R. flavefaciens Docs generated an unconventional single proteinbinding interface that specifically interacts with the Coh of ScaC adaptor scaffoldin. Notably, groups 3 and 6 Docs were found to be predominantly appended to hemicellulases, suggesting that R. flavefaciens has evolved an original mechanism to recruit this subset of enzymes that are critical to plant cell wall degradation to the cellulosome. Thus, instead of binding a significant array of hemicellulases directly to primary scaffoldins, enzymes affixed with group 3 and 6 Docs are mobilized to the highly intricate bacterial nanomachines produced by R. flavefaciens to degrade recalcitrant polysaccharides via an adaptor scaffoldin. This observation suggests that hemicellulases may either act freely during carbohydrate hydrolysis, in the absence of ScaC, or be recruited to cellulosomes once the ScaC adaptor scaffoldin is expressed. This hypothesis is currently under investigation and may indicate that R. flavefaciens has developed highly elaborate mechanisms to fine-tune plant cell wall degradation.

Experimental Procedures
Gene Synthesis and DNA Cloning-Docs are inherently unstable when produced in E. coli. To promote Doc stability, R. flavefaciens FD-1 Doc3 of protein ZP_06143424 (residues 888 -952) was co-expressed in vivo with CohScaC. The immediate binding of Doc3 to CohScaC confers the necessary Doc stabilization. The genes encoding the two proteins were designed with a codon usage optimized to maximize expression in E. coli, synthesized in vitro (NZYTech Ltd., Lisbon, Portugal), and cloned into pET28a (Merck Millipore, Germany) under the control of separate T7 promoters. The Doc3-encoding gene was positioned at the 5Ј end and the CohScaC-encoding gene at the 3Ј end of the artificial DNA. A T7 terminator sequence (to terminate transcription of the Doc gene) and a T7 promoter sequence (to control transcription of the Coh gene) were incorporated between the sequences of the two genes. This construct contained specifically tailored NheI and NcoI recognition sites at the 5Ј end and XhoI and SalI at the 3Ј end to allow subcloning the nucleic acid into pET-28a (Merck Millipore) such that the sequence encoding a six-residue His tag could be introduced either at the N terminus of the Doc (through digestion with NheI and SalI, incorporating the additional sequence MGSSHHHHHHSSGLVPRGSHMAS at the N terminus of the Doc3) or at the C terminus of the CohScaC (by cutting with NcoI and XhoI, which incorporates the additional sequence LEHHHHHH at the C terminus of the Coh). Thus, as a result of this strategy, two pET28a plasmid derivatives were produced as follows: pET28DtC with the engineered tag at the dockerin, and pET28DCt where the engineered tag is attached to the Coh. The two plasmids were used to express RfCohScaC-Doc3 complexes in E. coli. Recombinant Doc3 and CohScaC primary structures are presented in supplemental Table S1.
To produce recombinant Cohs and Docs individually, an ELISA-based system designed to probe Coh-Doc affinities that require fusion with xylanase or carbohydrate-binding modules was selected, as it allows production of highly stable and functional Coh and Doc derivatives (19). Thus, sequences encoding Doc3 and CohScaC were amplified from R. flavefaciens FD-1 genomic DNA by PCR, using NZYProof polymerase (NZYTech Ltd., Portugal) and the primers shown in supplemental Table  S2. After gel purification, the Doc3-encoding amplicon was inserted into a xylanase-Doc cassette in pET9d plasmid after digestion with KpnI and BamHI and ligation with T4 ligase. The resulting expressed product constitutes a His-tagged Doc3 fused to xylanase T-6 from Geobacillus stearothermophilus at the N terminus of the polyhistidine tag (XynDoc3). The CohScaC-encoding gene was cloned into a CBM-Coh cassette in pET28a after digestion with BamHI and XhoI restriction enzymes. This resulted in a His-tagged CohScaC recombinant derivative fused to a CBM3a from the C. thermocellum scaffoldin CipA (CBMCohScaC) (20).
To identify the Doc residues that modulate Coh recognition, several XynDoc3 protein derivatives were produced using sitedirected mutagenesis. PCR amplification of the Doc-containing plasmid, using the primers presented in supplemental Table S2, allowed the production of seven Doc3 protein derivatives, namely F902A, R908A, H943A, F902A/R908A, F902A/H943A, R908A/H943A, and F902A/R908A/H943A. Each of the newly generated gene sequence was fully sequenced to confirm that only the desired mutation accumulated in the nucleic acid.
To remove the 15-residue ␤-flap present in ␤-strand 8, an overlapping PCR protocol was carried using the plasmid encoding CBMCohScaC as template. The two gene regions on each side of the 15-residue coding sequence (5Ј and 3Ј fragment) were amplified in two separate reactions using the primers shown in supplemental Table S2. This resulted in the 3Ј end of the amplified 5Ј fragment being complementary to the 5Ј end of the amplified 3Ј fragment. The two fragments were then mixed at equimolar concentrations (0.15 pmol) and used as the template for a third PCR using the forward primer from the first reaction and the reverse primer from the second. The resulting product was cloned back into pET21a by cutting with NheI/ XhoI restriction enzymes and sequenced to confirm the integrity of the recombinant gene. The concentrations of each fragment were estimated in a NanoDrop 2000c spectrophotometer (Thermo Scientific).
The genes encoding several groups 3 and 6 Docs were cloned using the Gateway recombination cloning technology (Thermo Scientific). Sequences were amplified by PCR using R. flavefaciens FD-1 genomic DNA as template and using primers with engineered ends that allow site-specific recombination without the need for restriction enzymes (supplemental Table S3). Amplified genes were inserted into pDONR201 entry vector and from there into the protein expression destination vector pETG-20A, according to the manufacturer's protocol (Thermo Scientific). The genes are under the control of a T7 promoter. pETG-20A allowed the fusion of an N-terminal thioredoxin A (21) and an internal His tag to the recombinant Doc to promote protein stability and solubility.
Expression and Purification of Recombinant Proteins-Preliminary expression screens revealed that when the polyhistidine tag was located at the Doc N-terminal end in RfCohScaC-Doc3 complexes, the expression levels of both Coh and Doc were higher. Tagging the Coh resulted in the accumulation of large levels of unbound Coh in the purification product suggesting that Coh was expressed at higher levels than Docs. Consequently, the plasmid pET28DtC was used to transform E. coli BL21 (DE3) cells to produce RfCohScaC-Doc3 complex in large quantities. Transformed E. coli were grown at 37°C to an A 600 of 0.5. Recombinant protein expression was induced by the addition of 1 mM isopropyl ␤-D-1-thiogalactopyranoside followed by incubation at 19°C for 16 h. Cells were harvested by a 15-min centrifugation at 5000 ϫ g and resuspended in 20 ml of IMAC binding buffer (50 mM HEPES, pH 7.5, 10 mM imidazole, 1 M NaCl, 5 mM CaCl 2 ). Cells were then disrupted by sonication, and the cell-free supernatant was recovered by 30 min of centrifugation at 15,000 ϫ g. After loading the soluble fraction into a HisTrap TM nickel-charged Sepharose column (GE Healthcare, UK), initial purification was carried out by IMAC in an FPLC system (GE Healthcare, UK) using conventional protocols with a 35 mM imidazole wash and a 35-300 mM imidazole gradient. The buffer of all recovered fractions containing the purified cohesin-dockerin complex was exchanged into 50 mM HEPES, pH 7.5, containing 200 mM NaCl, 5 mM CaCl 2 using a PD-10 Sephadex G-25 M gel-filtration column (Amersham, UK). A further purification step by gel-filtration chromatography was performed by loading the samples onto a HiLoad 16/60 Superdex 75 (GE Healthcare, UK) at a flow rate of 1 ml min Ϫ1 . Fractions containing the purified complex were then concentrated with Amicon Ultra-15 centrifugal devices with a 10-kDa cutoff membrane (Millipore) and washed three times with molecular biology grade water (Sigma) containing 0.5 mM CaCl 2 . The protein concentration was estimated in a Nano-Drop 2000c spectrophotometer (Thermo Scientific) using a molar extinction coefficient (⑀) of 26,025 M Ϫ1 cm Ϫ1 . The final protein concentration was adjusted to 81 mg.ml Ϫ1 in molecular biology grade water containing 0.5 mM CaCl 2 . The purity and molecular mass of the recombinant complex were confirmed by 14% (w/v) SDS-PAGE.
Groups 3 and 6 Docs, CBMCohScaC, XynDoc3, and respective protein derivatives used in ITC and native PAGE experiments were expressed as described above and purified with IMAC using nickel charged Sepharose His GraviTrap gravityflow columns (GE Healthcare, UK). After IMAC, the recombinant cohesin and dockerins were buffer-exchanged to 50 mM HEPES, pH 7.5, 0.5 mM CaCl 2 , and 0.5 mM tris(2-carboxyethyl) phosphine using PD-10 Sephadex G-25 M gel filtration columns (GE Healthcare, UK).
Nondenaturing Gel Electrophoresis-For the nondenaturing gel electrophoresis experiments, each of the XynDoc3 variants, at a concentration of 30 M, was incubated in the presence and absence of 30 M ScaCCoh for 30 min at room temperature and separated on a 10% native polyacrylamide gel. Electrophoresis was carried out at room temperature. The gels were stained with Coomassie Blue. Complex formation was detected by the presence of an additional band displaying a lower electrophoretic mobility than the individual modules.
Isothermal Titration Calorimetry-All ITC experiments were carried out at 308 K. The purified XynDoc3 variants and CohScaC were diluted to the required concentrations and filtered using a 0.45-m syringe filter (PALL). During titrations, the Doc constructs were stirred at 307 rpm in the reaction cell and titrated with 28 successive 10-l injections of CohScaC at 220-s intervals. Integrated heat effects, after correction for heats of dilution, were analyzed by nonlinear regression using a single-site model (Microcal ORIGIN version 7.0, Microcal Software). The fitted data yielded the association constant (K A ) and the enthalpy of binding (⌬H). Other thermodynamic parameters were calculated using the standard thermodynamic equation, ⌬RTlnK A ϭ ⌬G ϭ ⌬H-T⌬S.
X-ray Crystallography, Structural Determination, and Refinement-The crystallization conditions were set up using the sitting-drop vapor-diffusion method with an Oryx8 robotic nanodrop dispensing system (Douglas Instruments, UK (22)). The commercial kits Crystal Screen, Crystal Screen 2, PEG/Ion, and PEG/Ion 2 (Hampton Research, CA), JCSGϩ HT96 (Molecular Dimensions, UK), and an in-house screen (80 factorial) were used for the screening. Precisely, 0.7-l drops of 40 and 81 mg ml Ϫ1 RfCohScaC-Doc3 were mixed with 0.7 l of reservoir solution at room temperature per well containing 50 l of the crystallization solution. The resulting plates were then stored at 292 K. Crystal formation was observed in 35 conditions after a period of ϳ30 days (maximum dimensions ϳ100 ϫ 20 ϫ 20 m). All the crystals were obtained from the initial screens. These crystals were cryoprotected with mother solution containing 20 -30% glycerol or with 100% Paratone-N (Hampton Research) and flash-cooled in liquid nitrogen.
Data were collected on beamline I04 at the Diamond Light Source, Harwell, UK, using a PILATUS 6 M detector (Dectris Ltd.) from crystals cooled to 100 K using a Cryostream (Oxford Cryosystems Ltd.). A systematic grid search was carried out on all of these crystals to select the best diffracting part of each crystal. EDNA (23) and iMosflm (24) were used for strategy calculation during data collection. All data sets were processed using the Fast_dp and xia2 (25) packages, which use the programs XDS (26) and POINTLESS and SCALA (27) from the CCP4 suite (28). Data collection statistics are given in Table 1.
The best diffracting crystals were the ones formed in condition JCSGϩ 2.33 (0.1 M potassium thiocyanate, 30% (w/v) PEG 2000 MME) and diffracted to a resolution of 2.16 Å and belonged to the orthorhombic space group P2 1 2 1 2 1 . BALBES was used to carry out molecular replacement (29). The best solution was found using the type I cohesin-dockerin complex from C. thermocellum (PDB entry 2ccl (12)), the cohesin of which displayed a sequence identity of 30.0 and 31.7% for its dockerin, with an R factor and Rfree of 24.45 and 30.58%, respectively, and a Q-factor of 0.506 after REFMAC5 (30) at the end of the BALBES run. Two copies of the heterodimer RfCohScaC-Doc3 complex are present in the asymmetric unit. This model was adjusted and refined using REFMAC5 and PDB REDO (31) interspersed with model adjustment in COOT to give the final model (Protein Data Bank code 5LXV, see Table 1). The final round of refinement was performed using the TLS/restrained refinement procedure using each module as a single group. The root mean square deviations of bond lengths, bond angles, torsion angles, and other indicators were continuously monitored using validation tools in COOT and MOLPROBITY. A summary of the refinement statistics is shown in Table 1.