Pinpoint Mapping of Recognition Residues on the Cohesin Surface by Progressive Homologue Swapping*

The high affinity cohesin-dockerin interaction dictates the suprastructural assembly of the multienzyme cellulosome complex. The connection between affinity and species specificity was studied by exploring the recognition properties of two structurally related cohesin species of divergent specificity. The cohesins were examined by progressive rounds of swapping, in which corresponding homologous stretches were interchanged. The specificity of binding of the resultant chimeric cohesins was determined by enzyme-linked affinity assay and complementary protein microarray. In succeeding rounds, swapped segments were systematically contracted, according to the binding behavior of previously generated chimeras. In the fourth and final round we discerned three residues, reputedly involved in interspecies binding specificity. By replacing only these three residues, we were able to convert the specificity of the resultant mutated cohesin, which bound preferentially to the rival dockerin with ∼20% capacity of the wild-type interaction. These residues represent but 3 of the 16 contact residues that participate in the cohesin-dockerin interaction. This approach allowed us to differentiate, in a structure-independent fashion, between residues critical for interspecies recognition and binding residues per se.

The incorporation of enzyme subunits into the cellulosome complex is mediated by a high affinity protein-protein interaction, in which cohesin modules, located on a cellulosomal scaffoldin subunit, bind tenaciously to complementary dockerin domains, borne by the enzyme subunits (1)(2)(3). The precise molecular basis for this interaction has been the subject of intensive investigation during the past several years. Within a given species, the cohesin-dockerin interaction generally appears to lack specificity in that the various cohesins of the primary scaffoldin tend to generally recognize the dockerins derived from the different enzyme subunits (4 -6). On the other hand, in several cases the cohesins from different species generally fail to recognize dockerins from another bacterium (5,7). The biological significance of the observed intraspecies fidelity versus cross-species variance remains to be elucidated.
Crystal structures of cohesins from Clostridium thermocellum and from Clostridium cellulolyticum were first reported, from which the overall shape and topology were determined (8 -10). On the basis of various analyses of the cohesin surface (comparative bioinformatics, homology modeling, electrostatic potential), the approximate location of the dockerin-binding surface was proposed (8). The corresponding face of the proposed binding surface comprised contiguous ␤-strands 8, 3, 6, and 5 of the all-␤ jelly-roll structure. The general fold is representative of an important type of stable protein fold that characterizes different families of protein modules, including carbohydrate-binding modules and certain carbohydrases (11)(12)(13). On the dockerin counterpart, a series of putative recognition determinants were examined by site-directed mutagenesis along with secondary structure predictions of the dockerin fold (7,14,15). The subsequent solution structure of the dockerin molecule corroborated the structure predictions, and a similar set of residues was proposed to participate in the binding interaction with the cohesin molecule (16,17). Bioinformaticsbased site-directed mutagenesis studies of the cohesin module (18 -20) failed to clarify the cohesin residues that play a definitive role in dockerin binding.
The very high affinity of both cohesins (K D Ͻ 10 Ϫ10 M) to their matching dockerins (15,(21)(22)(23) suggests that numerous intermodular interactions occur between the cohesin and dockerin. On the other hand, we reasoned that not all of the interactions would be necessary for interspecies specificity. We therefore embarked upon an alternative approach, designed to explicitly provide insight into the distinctive binding characteristics of the cohesin module, with the intention of identifying residues that confer species specificity. The approach used in this work involved progressive swapping of corresponding homologous stretches of the cohesin modules from C. thermocellum and C. cellulolyticum. The resultant chimeric cohesins were then screened using an enzyme-linked affinity assay and complementary protein microarray to provide insight into the residue(s) involved in the binding specificity as opposed to general binding. The progressive nature of this approach stems from the linkage between the experimentally determined spec-ificity of the interaction and the identity of the homologous segments. With each successive generation of homologue swapping, the reactive segments were constricted until only three selected residues of the C. cellulolyticum cohesin were mutated, resulting in a viable cohesin mutant that preferentially recognized the rival dockerin. During the course of this study, a crystal structure for the cohesin-dockerin heterodimer was elucidated from the C. thermocellum cellulosome (24). The putative specificity residues implicated in the present study represent but 3 of the 16 contact residues that participate in complex formation. The results of the present study may allow us to differentiate between residues critical for interspecies recognition and binding residues per se.

EXPERIMENTAL PROCEDURES
General DNA Manipulation-Genomic DNA was prepared from C. cellulolyticum ATCC 35319 and C. thermocellum YS as described previously (4,7,21) and was used as template for PCR amplification of the carbohydrate-binding module (CBM), 1 cohesin, and dockerin modules. QIAprep Spin Miniprep Kit (Qiagen, Valencia, CA) was used for small-scale plasmid isolation. The QIAquick PCR purification kit (Qiagen) was used for purification of PCR products and QIAquick gel extraction kit (Qiagen) for elution of DNA from agarose gels. All restriction enzymes (New England Biolabs, Beverly, MA), T7 DNA polymerase (Takara, Otsu, Shiga, Japan)), alkaline phosphatase, and T4 DNA ligase (Roche Applied Science, Mannheim, Germany) were used as recommended by the manufacturer. Transformation of Escherichia coli strains was performed as described previously (25). Sequencing was performed using an ABI 3700 DNA sequencing apparatus (PerkinElmer Life Sciences, Boston, MA).
Construction of Overexpression Plasmids-The dockerin modules of celA of C. cellulolyticum (Doc C ) and celS of C. thermocellum (Doc T ) were amplified using the primers DA-F/DA-R and DS-F/DS-R, respectively (the sequences of the primers are listed in Table SI, see Supplementary  Material). Both PCR products were digested with KpnI and BamHI and cloned separately into the KpnI/BamHI-linearized plasmid pMalc2e (New England Biolabs), which resulted in the plasmids, pMalc2e-doc C and pMalc2e-doc T , respectively.
In order to prepare an expression vector for production of CBM fusion proteins, the gene encoding for the C. thermocellum CBM3 module (4) was first inserted upstream of the multiple cloning site of pET28a. The cbm3 gene was amplified using the primers CM-F/CM-R (Table SI, see  Supplementary Material). The desired PCR fragment was cleaved using BspHI and NcoI and ligated into an NcoI-linearized pET28a plasmid (Novagen, Madison, WI). The clone with cbm3 in the correct orientation was identified by sequencing and named plasmid pET28a-cbm3.
The genes encoding for the C. cellulolyticum and C. thermocellum cohesin modules were amplified using primers Fc/Rc and Ft/Rt, respectively (Table SI, see Supplementary Material). The PCR fragments were cut with NcoI and XhoI and ligated into a NcoI/XhoIlinearized pET28a-cbm3 plasmid, resulting in plasmids pSW-coh C and pSW-coh T , respectively.
Swapping was performed using overlap-extension PCR. The genes coding for the resultant chimeric cohesins of the four consecutive rounds were prepared using appropriate primers as listed in Table SI (see Supplementary Material). For each chimera, two (and in some cases three) successive PCR reactions were required as described in Table SII (see Supplementary Material). The PCR fragments were cleaved and ligated, as described above, into a linearized pET28a-cbm3 plasmid.
Expression and Purification of Recombinant Proteins-MBP-Doc C and MBP-Doc T were expressed and purified according to the pMAL™ Protein Fusion and Purification System: Instruction Manual of New England Biolabs. In this experiment, expression was induced using 0.5 mM isopropyl-1-thio-␤-D-galactopyranoside and the TBS column buffer consisted of 137 mM NaCl, 2.7 mM KCl, 25 mM Tris-HCl, pH 7.4. After elution, the proteins were divided into aliquots and stored at Ϫ20°C.
The expression and purification of CBM-cohesin fusion proteins was accomplished as described previously (4) with minor modifications. Induction was initiated with 0.5 mM isopropyl-1-thio-␤-D-galactopyranoside. The harvested cells were resuspended in phosphate-buffered saline (137 mM NaCl, 2.7 mM KCl, 10 mM Na 2 HPO 4 , 2 mM KH 2 PO 4 , pH 7.4). The fusion proteins, bound to the microcrystalline cellulose, were washed three times with phosphate-buffered saline containing 1 M NaCl and three times with phosphate-buffered saline in the absence of NaCl. Elution and neutralization were done accordingly, and the proteins were stored in 50% (v/v) glycerol at Ϫ20°C. E. coli BL21 pLysS(DE3) was used as a host for the expression of CBM-cohesin fusion proteins and the MBP-dockerin fusion proteins.
Protein Microarray-Protein microarray was performed according to Ofir et al. 2 Briefly, the CBM-cohesin chimeras were diluted in TBS to a final concentration of 50 and 100 g/ml, arrayed into a 384 well plate (Genetix, New Milton, Hampshire, UK) and then printed onto a cellulose slide (Zephyr ProteomiX, Kiryat Shmonah, Israel) using a Biorobotics TAS arrayer (Genomic Solutions, Huntingdon, UK) equipped with 16 0.2-mm solid pins. MBP-Doc C was labeled with Cy3 monoreactive dye, and MBP-Doc T and polyclonal rabbit anti-CBM were labeled with Cy5 mono-reactive dye (Amersham Biosciences, Freiburg, Germany) as recommended by the manufacturer. Unconjugated dye was removed using a Slide-A-Lyzer Mini Dialysis Unit (Pierce). The slides were blocked in TBSCB for 1 h at room temperature. The following step was carried out in the dark at room temperature. The slides were incubated with both of the fluorescence-labeled probes, MBP-Doc C (2.5 g/ml) together with MBP-Doc T (2.5 g/ml), or the fluorescencelabeled rabbit ␣-CBM (1:400) in 4 ml of TBSCB for 1 h. The cells were then washed four times with TBSCT for 5 min. The slides were dried and scanned using a Scanarray 4000 (PerkinElmer Life Sciences).

RESULTS
To determine the residue(s) of the cohesins responsible for biorecognition we combined homologous swapping with an ELISA-based assay. Chimeric cohesins were constructed by progressive generations of swapping, whereby corresponding homologous stretches of the C. thermocellum and C. cellulolyticum cohesins were interchanged and fused to a family-3 CBM from C. thermocellum. The resultant chimeras were tested for specificity of binding by employing an ELISA-based assay using plates coated with dockerins of both rival species fused to the maltose-binding protein (MBP). The corresponding wildtype cohesins were used as reference.
First Generation Swapping-Six chimeric proteins were constructed using overlap-extension PCR. The conserved amino acid pairs highlighted in black in Fig. 1 were used as sites to divide the proteins into three. Four of the six chimeras (Sw1-1 to Sw1-4) bind selectively to the C. thermocellum dockerin with little or no cross-reactivity with the C. cellulolyticum dockerin (Fig. 2B). Sw1-5 is the only chimera of the first round that binds preferentially to the C. cellulolyticum dockerin. In contrast, Sw1-6 exhibits very little, if any, reactivity with either of the two dockerins.
The preference in favor of the C. thermocellum system can be explained by the superior affinity characteristics for the cohe- sin-dockerin interaction of the latter over those of C. cellulolyticum, as observed previously (5,15,(21)(22)(23). In any case, Sw1-2, which contains both the first and last third of the C. thermocellum cohesin, binds strongest to the corresponding dockerin. In a similar manner, Sw1-5, which contains both the first and last third of the rival C. cellulolyticum cohesin, also binds to the corresponding dockerin. Based on these results, we concluded that critical amino acid residues, located in the first and last third of the molecule, are involved in the interspecies differential recognition properties. The nominal binding properties observed for Sw1-6 might reflect either the above-described preference (in which case the rival specificity segments would counteract each other) or improper folding in or near the binding face of the molecule.
Second Generation Swapping-In this round, the first and last third of the cohesin molecule were each split into two parts, and the various combinations were prepared. As shown in Fig.  3A, two chimeric proteins were based on Sw1-1, eight chimeric proteins were based on Sw1-2 and another eight were based on Sw1-5. The amino acid pairs highlighted in gray in Fig. 1 were used to divide the first and last third into two.
Chimera Sw2-1 contained only the first sixth of the C. thermocellum cohesin module and Sw2-2 contained only the second sixth of the C. thermocellum cohesin module, while the rest of the molecule was of C. cellulolyticum origin (Fig. 3A, a). Sw2-1 recognized the C. cellulolyticum dockerin alone (Fig. 3B, a), whereas Sw-2-2 exhibited very low affinity to either dockerin with a slight but measurable preference for that of C. thermocellum. These results intimate that some of the important specificity residues are located in the second sixth (as opposed to the first sixth) of the molecule.
Sw1-2 was also used as a template to determine the role(s) of smaller portions of the identified specificity regions (Fig. 3A, b). Of the eight chimeric proteins based on this template, only the constructs containing both the second sixth and the last sixth (Sw2-4, Sw2-8, and Sw2-10) showed significant levels of binding to the C. thermocellum dockerin (Fig. 3B, b).
The binding affinity of the eight chimeric proteins based on Sw1-5 (Fig. 3A, c) further strengthened the contention that the second and last sixth are involved in the affinity for the C. thermocellum dockerin. Only constructs containing one or both of these sixths (Sw2-12, Sw2-16, and 2-18) exhibit this preference (Fig. 3B, c). Interestingly, Sw2-15 and Sw2-17 contain one of the two sixths, but do not comply with this rule. Again, as described above for chimera Sw1-6, the presence of incompatible segments derived from the rival species may serve to prevent conversion of specificity. Sw2-11, Sw2-13, and Sw2-14 exhibit low levels of binding for the C. cellulolyticum dockerin (Fig. 3B, c). As above, these results can be explained on the basis of the inferior affinity characteristics for the cohesin-dockerin interaction of C. cellulolyticum versus those of C. thermocellum (21,22). Despite the relatively modest affinity observed for these chimeras, the findings also support the premise that in C. cellulolyticum, the relevant region resides within the second and last sixth of the protein.
Third Generation Swapping-In the previous round, the replacement of complementary short stretches of the C. cellulolyticum cohesin on a background of C. thermocellum failed to generate significant affinity toward the C. cellulolyticum dockerin. On the basis of these results, we restricted our subsequent efforts toward unidirectional conversion of affinity from C. cellulolyticum to C. thermocellum. Toward this end, the second and last sixth of the two cohesins were split in three with overlapping regions as depicted in Fig. 4. Fig. 5A illustrates the combinations of cohesin segments used to uncover the successive trail of specificity between the two species; three chimeric proteins were based on Sw2-2, six others on Sw2-10, and another three were based on Sw2-4.
When identifying the part of the second sixth essential for recognition we found that Sw3-5 and Sw3-8 clearly exhibited the highest level of binding to the C. thermocellum dockerin compared with the chimeric proteins derived from the same precursor (Sw2-10). These results indicate that the central segment of the second sixth is involved in the specificity characteristics of the C. thermocellum cohesin (Fig. 5B). Conversely, Sw3-1 and Sw3-7 recognized the C. cellulolyticum dockerin, thus demonstrating that the first segment of the second sixth is not essential for recognition. When locating the part of the last sixth essential for recognition we found that Sw3-8 and Sw3-11 clearly exhibited the strongest binding to the C. thermocellum dockerin compared with their siblings. These results suggest that the central segment of the last sixth is involved in the specificity characteristics of the C. thermocellum cohesin. Based on the results of the third round of swapping, we concluded that selected residues that reside in both of the central segments (designated in Fig. 4) are involved in the specificity characteristics of the C. thermocellum cohesin.
Fourth Generation Swapping-At this point in our studies, the structure of the cohesin-dockerin heterodimer from C. thermocellum was brought to our attention, 3 and we were able to benefit from the consequent knowledge of the overall complement of contact residues. Prior to this stage, the selection of segments was based solely on the aforementioned criteria and/or as a response to the experimental results.
Examination of the crystal structure of the C. thermocellum cohesin-dockerin complex (24) revealed only three dockerinbinding contact residues that were located within the central segments of the cohesin chimeras, as designated in the third generation of swapping. These include Ala-36, Asn-37 of the ␤3 strand and Glu-131 of the loop separating ␤-strands 8 and 9 (Fig. 4). In addition, a gap occurs relative to the cohesin of 3 H. J. Gilbert, personal communication.

FIG. 1. Sequence alignment of scaffoldin-borne cohesin 2 from
C. thermocellum (Coh2-ct) and cohesin 1 from C. cellulolyticum (Coh1-cc). The conserved amino acid pairs highlighted in black were used as sites to divide the proteins into three in the first generation of swapping. The amino acid pairs highlighted in gray were used to further divide the first and last third into two during the second generation of swapping. The two boxed regions were used for the subsequent third generation swapping. Major secondary structural elements (␤-strands) are indicated by arrows. The residues of Coh2-ct and Coh1-cc, in this and subsequent figures, are numbered according to their PDB identification codes 1OHZ and 1G1K, respectively. C. cellulolyticum (Fig. 4). The gap and the suspected recognition residues were thus swapped either alone or combined in the present round. The designated residues in C. cellulolyticum were modified by site-directed mutagenesis to the corresponding residues in C. thermocellum. As shown in Fig. 6A, four chimeric proteins were based on Sw3-5, one chimeric protein was based on Sw3-6, three chimeric proteins were based on Sw3-11 and another four were chimeric proteins distinguished by alterations in single residues.
The four chimeric proteins based on Sw3-5 ( Fig. 6A, a) exhibited very low levels of binding relative to that observed for the parent chimera, although a modest preference for C. thermocellum was maintained (Fig. 6B, a). The results of Sw4-5 relative to Sw3-6 indicated that deletion of Met-31 from the C. cellulolyticum cohesin has little or no effect on the specificity (Fig. 6B, b).
In the three chimeric proteins based on Sw3-11, the first third of C. thermocellum was maintained to support the mutations in the second segment. In Sw4-6, TMS (residues 125-127 of the C. cellulolyticum cohesin) were swapped with DLV; in Sw4-7, SKI (residues 127-129) were swapped with VEQ, and in Sw4-8 Lys-128 with Glu. The results show that at the Cterminal portion of the C. thermocellum cohesin, Glu-131 is a critical determinant for dockerin recognition (Fig. 6B, c). Interestingly, the single mutation exhibited an elevated interaction with the C. thermocellum dockerin, compared with that of Sw3-11, which comprised a greater portion of the C. thermocellum sequence.
All of the final four chimeric proteins included the critical K128E mutation, together with combinations of the ⌬M31, G35A, and T36N mutations. The results (Fig. 6B, d) demonstrate that by changing only three residues of the C. cellulolyticum cohesin (G35A, T36N, and K128E), the binding preference of recognition was converted to that of the C. thermocellum dockerin instead of its own. Approximately 20% of FIG. 3. Second generation swapping. A, Schematic representation of the eighteen chimeric cohesins resulting from swapping sixths of the C. thermocellum (t, black) and C. cellulolyticum (c, white) cohesins. Two chimeric proteins were based on Sw1-1 (a), eight chimeric proteins were based on Sw1-2 (b), and another eight were based on Sw1-5 (c). B, binding analysis of the second generation chimeras. The eighteen chimeric cohesins were purified and analyzed on ELISA plates for speciesspecific binding activity as described in the legend to Fig. 2. (Coh1-cc). The arrows represent ␤-sheets (numbered). The second and last sixth were divided into three with overlapping regions .   FIG. 2. First generation swapping. A, schematic representation of the six chimeric cohesins resulting from swapping among thirds of the C. thermocellum (t, black) and C. cellulolyticum (c, white) cohesins. B, binding analysis of the first generation chimeras. The six chimeric cohesins were fused to the CBM3 module of C. thermocellum, the resultant fusion chimeras were purified and analyzed on ELISA plates for binding to MBP-fused dockerins of both species. Similarly fused, wild-type cohesins (wt-t and wt-c) were used as references.

FIG. 4. Sequence alignment of the second and last sixth of the cohesins derived from C. thermocellum (Coh2-ct) and C. cellulolyticum
the wild-type binding activity (compared with that of the C. thermocellum cohesin-dockerin interaction) was observed for the cross-species binding, and about 5% of the original wildtype binding remained for the C. cellulolyticum dockerin. If the T36N was lacking, the resultant C. cellulolyticum mutant showed preference for its own dockerin; in the absence of the G35A mutation, only low levels of activity were observed with negligible, if any, preference for the cross-species interaction.
Simultaneous Analysis of the Four Rounds of Swapping Using Protein Microarray-In order to validate the above-described results obtained with the ELISA assay, we established a protein microarray assay, based on selective binding of the CBM component of the chimeras to a cellulose-containing surface. 2 The latter protocol also allowed us to determine simultaneously the specificity of all the cohesin-containing chimeras toward both species of dockerin probe on a single microarray. Besides the simultaneous detection of binding to both dockerincontaining probes, this approach was complementary in format to the ELISA assay, in that the cohesin-containing chimeras were now the immobilized modules in contrast to immobilization on ELISA plates of the dockerin-containing MBP fusion proteins.
In this protocol, all of the chimeras were printed on a cellulose slide, which was incubated simultaneously with Cy3-labeled (red) MBP-Doc C and Cy5-labeled (green) MBP-Doc T (Fig.  7). We confirmed that the chimeras were printed in equivalent quantities by incubating a chip from the same batch with Cy5-labeled anti-CBM antibodies (data not shown).
The results of the protein microarray experiments were essentially consistent with those obtained with the ELISA assay.
For example, in the first generation swapping (Fig. 2), Sw1-5 was the only chimera that recognized MBP-Doc C , and this finding was confirmed by protein microarray (Fig. 7). Likewise, the highest level of binding of MBP-Doc T , as observed by the ELISA-based assay in the first round, was exhibited by Sw1-2, followed by Sw1-4, Sw1-3, and Sw1-1. Again, this trend was evident in the protein microarray experiments. In fact, the trend clearly continued in successive generations of swapping, such that in every case but one (Sw2-12), the chimera shown to recognize one or the other dockerin exhibited the same preference in the protein microarray assay as in the ELISA assay.

DISCUSSION
Initial studies on the cohesin-dockerin interaction from different species indicated a general lack of intraspecies specificity (4 -6), whereas between species the interaction appeared to be selective (5,7). This rule usually prevails within a given scaffoldin, but not among different scaffoldins, even those produced by the same species (26 -30). A better understanding of this phenomenon will ensue from accumulation of new cohesin and dockerin sequences from different species and by examining the characteristics of the intermodular cohesin- dockerin   FIG. 5. Third generation swapping. A, schematic representation of the twelve chimeric cohesins resulting from swapping the short segments of the C. thermocellum (t, black) and C. cellulolyticum (c, white) cohesins, designated in the legend to Fig. 4. Three chimeric proteins were based on Sw2-2 (a), three chimeric proteins were based on Sw2-4 (c), and another six were based on Sw2-10 (b). B, binding analysis of the third generation chimeras, purified and analyzed for species-specific binding activity as described in the legend to Fig. 2.  FIG. 6. Fourth generation swapping. A, schematic representation of the twelve chimeric cohesins resulting from swapping short segments of the C. thermocellum (t, black) and C. cellulolyticum (c, white) cohesins and/or by mutagenic replacement of individual amino acid residues of the C. cellulolyticum cohesins with those of C. thermocellum. Four chimeras were based on Sw3-5 (a), one on Sw3-6 (b), three were based on Sw3-11 (c) and another four were chimeric proteins with mutations of individual residues (d). B, binding analysis of the fourth generation chimeras, purified and analyzed for species-specific binding activity as described in the legend to Fig. 2. interaction on the molecular level. Moreover, characterization of the molecular components of this high affinity interaction will provide insight into protein-protein interactions in general.
Previous research has determined several residues of the dockerin domain involved in species specificity (7,14,15,31). While there was no prior knowledge of the link between protein sequence and function of the dockerin module, the relatively small size and the conserved patterns within the dockerin sequences enabled a rational mutagenesis approach based on comparative bioinformatics. In this context, we converted the specificity of one dockerin species to that of the other (i.e. from C. thermocellum to C. cellulolyticum) by exchanging the residues in suspect (14,15). The concept behind this approach is that successful and exclusive conversion from one species to the other would intrinsically preclude any ambiguity regarding improper folding or nonspecific interaction.
In the case of the cohesin module, application of a similar rational-based approach failed to determine the residues responsible for species-specificity (18 -20). In this case, a clear pattern could not be observed as obtained for the alignments of the dockerin domains (data not shown). Various alternative approaches were thus considered for gaining general insight into the binding characteristics of the cohesin module, including alanine-scanning mutagenesis (32), random point mutagenesis and DNA shuffling (33). In addition to practical obstacles, the resultant cohesin mutants would be subjected to screening on the basis of selective binding to dockerins of divergent specificity. Such screening, however, would likely be inadequate for our purposes, since this approach would select only for the strongest binders to a given dockerin. Desired chimeras, containing minimal numbers of specificity-determining residues, are not necessarily the strongest or exclusive binders.
In view of the above considerations, we explored a different strategy involving progressive homologue swapping, which is a derivative of homologue scanning as described previously (32). The strategy we pursued, however, involved progressive refinement of the regions of interest. The relevant activity (binding to the divergent dockerins) was assessed following consecutive rounds of swapping. Consequently, the approach is independent of preconceived notions and becomes more rational with each round of swapping. In fact, prior knowledge of the structure of the protein is not entirely necessary, although in the later rounds of swapping knowledge of the structure enabled us to focus more intelligently on salient regions or residues. Moreover, this approach precludes the necessity of screening large libraries of mutants; in contrast to conventional DNA shuffling, there is no need for high sequence identity between parent strains. Furthermore, this approach allows us to address the property of differential recognition. Although sequence homology between the cohesin modules of C. thermocellum and C. cellulolyticum is relatively low (32% identity), the structure of the cohesins is well preserved, due to the common fold and conservation of key hydrophobic protein core residues. Thus, progressive homologue swapping of the cohesin molecule should result in viable chimeric species. The progression of the common denominators that appeared responsible for the observed activity is shown schematically in Fig. 8. After three rounds of swapping, we narrowed down the region of interest into two segments of eight and twelve amino acids of the ϳ150 amino acids, located mainly on one side of the cohesin and arranged primarily on the ␤3 strand and the loop bridging ␤-strands 8 and 9 (Fig. 4). The analysis of the C. thermocellum cohesin-dockerin structure revealed that, of the 16 known contact residues (24), only two hydrogen-bonding residues (Asn-37 and Glu-131) and one hydrophobic contact residue (Ala-36) of the cohesin were situated within these two segments ( Fig. 9 and Table I). In fact, Asn-37 and Glu-131 both FIG. 7. Protein microarray showing the preference of binding of the library of cohesin chimeras to either MBP-Doc C (red spots) or MBP-Doc T (green spots). Quadruplicates of two different concentrations (the upper and lower four spots represents 50 and 100 g/ml, respectively) of each chimera were printed on the cellulose chip. The slide was subsequently incubated with labeled MBP-Doc C and MBP-Doc T together (in a 10:1 molar ratio, respectively) and finally scanned using Scanarray 4000 (see "Experimental Procedures"). The linear strip of six spots denotes sextuplicates of the same concentrations of the wild-type cohesins from C. thermocellum and C. cellulolyticum (labeled ct and cc, respectively). , that appeared to influence the interspecies preference of dockerin binding between C. thermocellum and C. cellulolyticum. On the right of the scheme are shown the corresponding regions or amino acid residues on the cohesin structure (PDB code 1G1K), color-coded as in the respective scheme. Front represents the 8365-strand surface of the cohesin molecule, and Back represents the 91274 surface.
form direct side-chain-to-side-chain hydrogen bonds with the major dockerin-based residues (Ser-45 and Thr-46, respectively), which were previously implicated in species specificity (7,14). The involvement of the designated cohesin residues in the interspecies recognition is thus strongly supported by the cross-species binding to the C. thermocellum dockerin of the C. cellulolyticum cohesin, in which only the three residues are mutated (G35A, T36N, and K128E). The latter three residues may constitute a minimal number of residues that would effect cross-species interaction. This finding corroborates one of our original assumptions in this work that the two species of cohesin are structurally analogous and can serve as templates for homologue swapping and mutagenesis experiments. It is extraordinary that the additional contact residues appear not to be decisive for species specificity, despite the general lack of conservation, between the two species of cohesin, in the relevant regions of the molecular surface.
It is interesting to consider what molecular features could account for the differential binding between the two species. In this context, Ser-45 and Thr-46 of the C. thermocellum dockerin are normally replaced in the C. cellulolyticum dockerin by alanine and leucine (or isoleucine), respectively, the latter of which are both markedly more hydrophobic than their crossspecies counterparts. One would anticipate that the corresponding interacting residues on the cohesin would reflect this difference. Indeed, the combined exchange between C. cellulolyticum and C. thermocellum of a threonine instead of asparagine and lysine instead of glutamic acid could presumably account for the difference in specificity. Future crystallographic studies of the complex formed between the dockerin and cohesin chimeras should clarify the molecular forces that govern the differential specificity of these interactions.
Ultimately, we hope to determine the precise elements that control the specificity of interaction between cohesins and dockerins in general. Such an accomplishment would have potential biotechnological value, since a graded range of affinities among affinity partners would serve in various types of applications. The cohesin and dockerin domains are particularly appropriate for the production of complementary hybrid proteins, since both operate naturally as functional modules in parent protein pairs. Further investigation into the biorecognition properties of the cohesin-dockerin interaction in general is currently in progress. FIG. 9. Comparison of recognition residues versus contact residues on the C. thermocellum cohesin surface (PDB code 1ANU). A, contact residues (see Table I) are those singled out by Carvalho et al. (24). B, putative recognition residues according to the results described in this work. Amino acids involved in direct hydrogen bonding are colored green, hydrophobic residues in the cohesin-dockerin contact surface are colored gray, and amino acids that interact with the dockerin molecule via bridging water molecules are colored cyan.