Discrete and Structurally Unique Proteins (Tāpirins) Mediate Attachment of Extremely Thermophilic Caldicellulosiruptor Species to Cellulose*

Background: Lignocellulose-degrading microorganisms utilize binding modules associated with glycosidic enzymes to attach to polysaccharides. Results: Structurally unique, discrete proteins (tāpirins) bind to cellulose with a high affinity. Conclusion: Tāpirins represent a new class of proteins used by Caldicellulosiruptor species to attach to cellulose. Significance: The tāpirins establish a new paradigm for how cellulolytic bacteria adhere to cellulose. A variety of catalytic and noncatalytic protein domains are deployed by select microorganisms to deconstruct lignocellulose. These extracellular proteins are used to attach to, modify, and hydrolyze the complex polysaccharides present in plant cell walls. Cellulolytic enzymes, often containing carbohydrate-binding modules, are key to this process; however, these enzymes are not solely responsible for attachment. Few mechanisms of attachment have been discovered among bacteria that do not form large polypeptide structures, called cellulosomes, to deconstruct biomass. In this study, bioinformatics and proteomics analyses identified unique, discrete, hypothetical proteins (“tāpirins,” origin from Māori: to join), not directly associated with cellulases, that mediate attachment to cellulose by species in the noncellulosomal, extremely thermophilic bacterial genus Caldicellulosiruptor. Two tāpirin genes are located directly downstream of a type IV pilus operon in strongly cellulolytic members of the genus, whereas homologs are absent from the weakly cellulolytic Caldicellulosiruptor species. Based on their amino acid sequence, tāpirins are specific to these extreme thermophiles. Tāpirins are also unusual in that they share no detectable protein domain signatures with known polysaccharide-binding proteins. Adsorption isotherm and trans vivo analyses demonstrated the carbohydrate-binding module-like affinity of the tāpirins for cellulose. Crystallization of a cellulose-binding truncation from one tāpirin indicated that these proteins form a long β-helix core with a shielded hydrophobic face. Furthermore, they are structurally unique and define a new class of polysaccharide adhesins. Strongly cellulolytic Caldicellulosiruptor species employ tāpirins to complement substrate-binding proteins from the ATP-binding cassette transporters and multidomain extracellular and S-layer-associated glycoside hydrolases to process the carbohydrate content of lignocellulose.

Interest in producing biofuels from lignocellulosic substrates has intensified focus on the mechanisms by which microorganisms degrade and utilize plant biomass. To date, most attention has been focused on cellulolytic enzymes implicated in this process, but it has been established for some time that noncatalytic, biomolecular contributions are critical to the degradation of crystalline cellulose (1,2). In addition to catalytic domains, glycoside hydrolases (GHs) 3 capable of this difficult biotransformation typically contain carbohydrate-binding modules (CBMs) that are currently classified into at least 71 families, based on amino acid sequence homology (3,4). CBMs play a role in maintaining proximity between the active site and substrate surface, as well as in modifying the electronic structure of cellulose to promote hydrolysis (3). Metagenomic screening of microbial communities growing on cellulosic materials has typically expanded known inventories of GHs and CBMs (5)(6)(7) and has also facilitated the identification of novel CBM families (8) and GH families (9).
Species-level analysis of plant biomass deconstruction has revealed synergism between cellular and enzymatic processes in degrading lignocellulose. Indeed, within the expanding genome sequence databases, novel, often cell membranebound, cellulose-degrading systems are being discovered. For example, in cellulolytic members of the Fibrobacteres-Chlorobi-Bacteroidetes phyla, the ruminal bacterium Fibrobacter
Cultivation of Caldicellulosiruptor sp. for Microscopy and Proteomics Screening-Modified DSMZ medium 640 and culturing conditions used in this study are described elsewhere (25). Proteomics screening was conducted, as described previously (24). Cells were fixed for electron microscopy using a 4:1 (v/v) mixture of formaldehyde and glutaraldehyde, respectively. Scanning and transmission electron micrographs were captured at the Laboratory for Advanced Electron and Light Optical Methods (College of Veterinary Science, North Carolina State University). Epifluorescent microscopy was conducted as described previously (25) using Sytox Green (Invitrogen) to stain the cells.
Bioinformatic Analysis-Additional ta pirins were identified in related Caldicellulosiruptor species from the GenBank TM database nr and IMG database using BLASTP analysis (33) with known protein sequences as the query. Alignment of protein sequences used the Muscle algorithm (34). Neighbor-Joining phylogenetic trees were estimated and drawn using the MEGA version 6.0 software package, with 500 bootstraps used (35). Signal peptide leader sequences were predicted using SignalP 4.1 trained for Gram-positive bacteria (36). Transmembrane domains were predicted using the TMHMM Server, version 2.0, in conjunction with signal peptide data predicted by Sig-nalP (37). InterPro (38) was used to scan for functional protein domain signatures.
Cloning, Production, and Purification of Recombinant Ta pirins-Calkro_0844 (GenBank TM accession number YP_ 004023543) and Calkro_0845 (GenBank TM accession number YP_004023544) were cloned without their respective signal peptides or transmembrane domains into the expression vector pET46 Ek/LIC (EMD Millipore). Oligonucleotide primer sequences used for cloning, including for Csac_1073 (GenBank TM accession number YP_001179878), are listed in Table 1. All ta pirin proteins were produced with N-terminal His 6 tags for purification via immobilized nickel affinity columns (5 ml of HisTrap, GE Healthcare), according to manufacturer's protocols. Autoinduction medium (39) was used for induction of protein production, including higher concentrations of kanamycin (100 g/ml) as recommended for higher phosphate media. Concentration of purified protein was determined using the bicinchoninic acid assay (Thermo Scientific Pierce) using bovine serum albumin for the standard curve.
Binding of Ta pirins to Substrates-Binding of ta pirins to insoluble substrates included the following: 40 g of purified protein (Csac_1073, Calkro_0844, or Calkro_0845) and 9 mg of substrate in a total volume of 100 l. For binding experiments, all of the substrates were washed overnight (10 g/liter substrate) with binding buffer (50 mM MES, 3.9 mM NaCl, pH 7.2) at 70°C and dried at 70°C in an oven prior to weighing them out for the binding assays. Binding was allowed to proceed at 70°C and 750 rpm in a Thermo-mixer (Eppendorf). After 1 h of incubation, the bound and unbound portions of protein were separated via centrifugation, and the bound fraction was washed three times with binding buffer, centrifuging again after each wash. One volume of 2ϫ Laemmli sample buffer was added to the unbound sample, and an equal volume of 1ϫ Laemmli sample buffer diluted in the binding buffer was added to the bound sample. Samples were boiled for 30 min and then loaded onto an SDS-polyacrylamide gel for separation. SDS-polyacrylamide gels were stained with Gel Code Blue (Pierce) for visualization, with a protein ladder for reference (Benchmark, Life Technologies). Images are representative of three replicates.
Protein Adsorption to Cellulose-Recombinant ta pirin proteins (Calkro_0844 or Calkro_0845) were buffer-exchanged into binding buffer using 10-kDa molecular mass cutoff polyethersulfone ultrafiltration membranes (Millipore). Triplicate samples were established over a range of protein concentrations in microcentrifuge tubes with 3 mg each of substrate for 1 h at 70°C and 700 rpm in a Thermo-mixer (Eppendorf). As a control, protein, lacking substrate, was also incubated in a microcentrifuge tube. Protein concentrations of each data point, for unbound or free protein, were calculated by averaging technical triplicates using the bicinchoninic acid assay. Calculated unbound protein concentrations were then corrected for any protein adsorbing to the microcentrifuge tube. Triplicate data points were fit to a Langmuir isotherm, using Equation 1, where E b is the concentration of bound protein (mol⅐g cellulose Ϫ1 ); E f is the concentration of unbound protein (M); K a is the association constant (M Ϫ1 ); and B max is the maximum amount of protein bound by cellulose (mol⅐g cellulose Ϫ1 ). Parameters of association (K a ) and maximal binding capacity (B max ) were estimated using JMP (version 9, SAS, Cary, NC). Yeast Surface Display of Ta pirin Proteins-Ta pirins (Calkro_0844, Calkro_0845) were cloned into the vector pCTCON (40), using conventional ligation (Calkro_0845) or Gibson Assembly master mix (Calkro_0844) (New England Biolabs), according to the manufacturer's directions. The cloning strategy excluded signal peptides and/or transmembrane domains; oligonucleotide primers used for cloning are listed in Table 1. Resulting clones were transformed into competent S. cerevisiae strain EBY100, using the Frozen-EZ Yeast Transformation II kit (Zymo Research). Transformed yeast cells were directly plated on selective SDCAA medium (per 1 liter of medium: 5 g of casamino acids, 6.7 g of yeast nitrogen base (Difco), 20 g of D-glucose, 7.45 g of monobasic sodium phosphate, 5.4 g dibasic sodium phosphate, 15 g of agar, 182 g of sorbitol). Fusion-protein expression and yeast binding to Avicel was conducted as described by Nam et al. (41). Briefly, for induction of recombinant protein, yeast cultures were initially subcultured into liquid SDCAA (as above, including 1:100 penicillin/streptomycin (Invitrogen) without sorbitol and agar) and allowed to grow overnight at 30°C and 250 rpm in a shaking incubator. Cultures were harvested using centrifugation (2,500 ϫ g for 5 min, 4°C) and resuspended to an absorbance of 1 (A 600 ) in liquid SGCAA medium (as above, substituting 5 g/liter D-galactose for D-glucose). Protein expression continued at 20°C and 250 rpm for 20 h; afterward, the cells were pelleted by centrifugation as above and washed three times with phosphate-buffered saline (PBS) buffer, as described elsewhere (41). Cells were resuspended to an A 600 of 3 in PBS with 10 mg/ml Avicel. Attachment to Avicel proceeded at 4°C for 18 h with end-over-end rotation.
Immunofluorescence Microscopy with Yeast-Polyclonal antibodies raised against recombinant Calkro_0844 and Calkro_0845 were used for immunofluorescence of yeast cells expressing either ta pirin (GeneTel Laboratories, Madison, WI). For blocking, the yeast cell/Avicel slurry was resuspended in a blocking solution of 0.1% (w/v) bovine serum albumin in PBS and incubated on ice for 45 min. The cell slurry was then resuspended in primary antibody diluted 100ϫ in blocking solution and incubated with end-over-end rotation at room temperature for 1 h. The cell slurry was then pelleted and washed with blocking solution three times, after which the slurry was incubated with goat anti-rabbit DyLight488 conjugate (Immuno-Reagents) diluted 100ϫ in blocking solution for 30 min at room temperature with end-over-end rotation. Cells were washed one time in PBS and mounted in SlowFade Gold (Invitrogen) prior to epifluorescence microscopy using ϫ40 magnification. Truncated C-terminal Ta pirin Purification-Using the pET46 Ek/LIC-derived expression vector described above, selenomethionine-labeled (Acros Organics) Calkro_0844 was produced using protein production strain E. coli B834[DE3], pRARE2. A defined autoinduction medium (42) was used to incorporate selenomethionine into recombinant Calkro_0844. Purified Calkro_0844 was weakly digested with thermolysin (Promega) in the following reaction buffer: 50 mM Tris-Cl, pH 8.0, 0.15 M NaCl, and 0.5 mM CaCl 2 . Thermolysin was also resuspended at 1 mg/ml in the same reaction buffer. Protein and enzyme were mixed in a mass ratio of 1:500 and incubated for 1 h at 70°C in a thermocycler. After 1 h, the reaction mixture was chilled on ice and immediately loaded on a Sephacryl HR size exclusion column (S-100, GE Healthcare), connected to a BioLogic LP System (Bio-Rad) to purify a roughly 45-kDa fragment. Purity of the fragment was confirmed using SDS-PAGE.
Crystallization-Calkro_0844_C crystals were obtained by sitting drop vapor diffusion using a 96-well plate with PEG ion HT screen from Hampton Research (Aliso Viejo, CA). Fifty l of well solution was added to the reservoirs, and drops were made with 0.2 l of well solution and 0.2 l of protein solution using a Phoenix crystallization robot (Art Robbins Instruments, Sunnyvale, CA). The crystals were grown at 20°C with 0.07 M citric acid, 0.03 M Bistris propane, pH 3.4, and 16% (w/v) polyethylene glycol (PEG) 3350 as the well solution. The protein solution contained 5.5 mg/ml protein in 50 mM Tris, pH 8, and 150 mM NaCl.
Data Collection and Processing-The Calkro_0844_C crystal was flash-cooled in a nitrogen gas stream at 100 K before data collection. The crystallization solution with the PEG 3350 concentration increased to 30% (w/v) was used for freezing the crystal. Data were collected using in-house Bruker X8 Micro-Star X-Ray generator with Helios mirrors and Bruker Platinum 135 CCD detector. Data were then indexed and processed with the Bruker Suite of programs version 2013.8-1 (Bruker AXS, Madison, WI).
Structure Solution and Refinement-Intensities were converted into structure factors, and 5% of the reflections were flagged for R free calculations using programs SCALEPACK2MTZ, CTRUNCATE, MTZDUMP, Unique, CAD, FREERFLAG, and MTZUTILS from the CCP4 package of programs (43) version 6.4.0. PHASER EP from the CCP4 interface with HySS (44,45) Phaser (46) was employed for finding the selenium sites using single wavelength anomalous dispersion. Phaser EP failed to build the model using the resulting phases, but using the selenium sites found by Phaser EP Crank2 (47) Table 2. Programs Coot, PyMOL, and ICM were used for comparing and analyzing structures. This structure has been deposited to the Protein Data Bank with entry code 4WA0. Fig. 1 shows the attachment of a highly cellulolytic Caldicellulosiruptor species, C. kronotskyensis, to Avicel and dilute acidpretreated biomass (DAP P. trichocarpa ϫ P. deltoides or switchgrass) particles during growth. When observed under transmission electron microscopy, the outer cell surface appears rough, with structures protruding outside of the peptidoglycan layer (see Fig. 1, A and B). Additionally, C. kronotskyensis cells anchored in a web-like matrix were observed using scanning electron microscopy (Fig. 1C). Using epifluorescence microscopy ( Fig. 1, D and E), C. kronotskyensis cells can be observed adhering to both Avicel and DAP biomass, implicating these ultrastructural features in attachment to lignocellulose. This observation is also representative of other Caldicellulosiruptor species growing on plant biomass-based substrates (28,52). Previous studies showed that these bacteria are capable of forming biofilms on cellulosic substrates (29), but the intrinsic basis for their direct attachment to solid surfaces is unknown. To further explore mechanisms underlying cell-surface attachment, a proteomics screen was conducted for several weakly to strongly cellulolytic Caldicellulosiruptor species growing on Avicel (see Fig. 2). From this screen, several Avicelbound proteins could be identified (ta pirins), albeit annotated as hypothetical proteins in Caldicellulosiruptor genomes (24). The species previously determined to be moderately to strongly cellulolytic produce these proteins, which were highly enriched in the Avicel-bound (SB) fraction, although only the most cellulolytic species produced a second paralogous protein that was also enriched in the SB fraction (Fig. 2). In addition to normalized spectral counts, peptide coverage over the two ta pirin classes indicated that the first protein class was more abundant (46 -73%) than the second one (9 -43%). Two weakly cellulolytic Caldicellulosiruptor species (see Fig. 2) produced related proteins that were either enriched in the supernatant fraction (e.g. C. owensensis) or otherwise poorly expressed (e.g. C. hydrothermalis).

Proteomic Analysis of Caldicellulosiruptor Species Growing on Avicel Reveals Novel Cellulose-binding Proteins (Ta pirins)-
Bioinformatics analysis indicated that the ta pirins mapped to a locus in Caldicellulosiruptor genomes downstream of a locus predicted to encode a type IV pilus (Fig. 3A). Additionally, the predicted proteins are larger (ranging from 69 to 100 kDa) than typical type IV pilus-associated proteins. The genomes of strongly cellulolytic species (C. bescii, C. kronotskyensis, C. saccharolyticus, and C. obsidiansis) (24) all contained two paralogous classes of ta pirins, delineated by a phylogenetic tree built from alignments of their amino acid sequences (Fig. 3B). Interestingly, no other homologous genes or proteins to these two classes of ta pirins can be found in GenBank TM , outside of the genus Caldicellulosiruptor, confirming that these hypothetical proteins are unique to the genus. Furthermore, no protein domain signatures were detected from any homologs of the highly cellulolytic ta pirin proteins. Within each class of ta pirins, the orthologous proteins shared an amino acid sequence identity of greater than 80% identity over 95% or more of the protein (supplemental Table 1). The genomes of C. kristjanssonii and C. lactoaceticus also contained two genes encoding for proteins that were no more than 41% identical to the ta pirins from the four most cellulolytic Caldicellulosiruptor species (supplemental Table 1). These two ta pirin-like proteins can also be separated into two classes, because they shared even less amino acid homology (ϳ27%) to each other (supplemental Table 1 and Fig. 3).
Both C. owensensis and C. acetigenus, neither of which degrades cellulose to any extent (25,53), each contained two genes downstream of the type IV pilus locus which had highly divergent amino acid sequences from the other ta pirins and from each other. Finally, C. hydrothermalis, also not capable of significant cellulose degradation (24,25), contained just one protein, which shared little homology to either class of ta pirins from the strongly cellulolytic Caldicellulosiruptor species. However, this protein does share weak amino acid homology, 24%, with a ta pirin from a newly sequenced species, Caldicellulosiruptor sp. strain Rt8.B8 (supplemental Table 1). Another Caldicellulosiruptor species that is not fully sequenced, Caldi-  cellulosiruptor sp. strain Tok7.B1, also encodes for a protein that has homology with part of this type of ta pirin (GenBank TM accession number AAD30365), sharing 91% homology over 29% of the protein from Caldicellulosiruptor sp. Rt8.B8. This indicates that there is further diversity among the ta pirins from Caldicellulosiruptor species in nature.
Confirmation of Cellulose-binding Capacity for C. kronotskyensis Ta pirins-To establish that the ta pirins encoded by Calkro_0844 and Calkro_0845 could bind to cellulose, several approaches were used. First, we sought to determine whether the proteins were capable of binding to cellulose in the absence of any hypothesized interactions with the type IV pilus. Selected representative genes from each class of cellulolytic ta pirin (i.e. Calkro_0844 and Calkro_0845) were expressed as chimeric proteins fused to the C terminus of the yeast ␣-agglutinin protein to facilitate yeast surface display (40,54). Previously, yeast surface display systems in S. cerevisiae have successfully expressed and fused cellulose-specific CBM modules (41) and cellulases (55) from cellulolytic fungi to the yeast cell wall. Other protein complexes from bacterial systems can also successfully display on the yeast surface, including assembly of mini-cellulosomes using components from mesophilic (56) and thermophilic Clostridia (57)(58)(59).
After protein induction in the yeast host, each ta pirin could mediate cell attachment to Avicel (Fig. 4, C and D), whereas yeast cells not expressing either of the proteins were unable to attach (Fig. 4, A and B). Using polyclonal antibodies raised against one or the other class of ta pirins (rabbit anti-Calkro_0844 and rabbit anti-Calkro_0845), immunofluorescent microscopy demonstrated that the recombinant proteins were produced and linked to the yeast cell surface. Fluorescent signals were observed after binding to a DyLight488-conjugated secondary antibody (goat anti-rabbit) primarily located at the interface between the Avicel particles and yeast cells (Fig. 4,  G and H). Therefore, it appears that the inherent binding ability of the ta pirins remains intact. This was the case even when expressed in a eukaryotic yeast surface display system at temperatures approaching 50°C below the optimal growth temperature of the parent organism and in the absence of type IV pili.
A second in vitro approach sought to confirm the ability of recombinant versions from both classes of ta pirins to bind to a variety of carbohydrate and plant biomass substrates. Both classes of ta pirin proteins demonstrated some binding affinity to a variety of cellulosic substrates, including Avicel, filter paper, and dilute acid-pretreated plant biomass (Fig. 5). Multiple recombinant members of ta pirin class 1 (Csac_1073 and Calkro_0844, see Fig. 3) were used to confirm that similar binding profiles would occur between orthologs (Fig. 5, A and B). Importantly, the binding to cellulose appears to be specific, as neither of the ta pirins tested bound to xylan (Fig. 5, A-C). Presumably, the weak binding to unpretreated biomass is in part due to xylan masking the majority of available cellulose-binding sites. Because the binding assays were conducted at temperatures characteristic of the environment from which the proteins came, a type IV pilus is not required for the ta pirins to adsorb to cellulose.
To further investigate the specificity of adsorption for both classes of ta pirins to cellulose, binding affinities of both C. kronotskyensis proteins were modeled using Langmuir isotherms (Fig. 6). Affinity data for both classes of ta pirins revealed that FIGURE 3. Genomic locus containing type IV pilus-related genes and Caldicellulosiruptor ta pirin. A, layout of genes theorized to encode type IV pili and ta pirins include from left to right: response regulator (aqua); PulE (peach); twitching mobility protein (dark olive); PulF (light blue); hypothetical proteins with prepilin-type N-terminal cleavage domains (gray); prepilin peptidase (mint); ComFB (white); pilus assembly protein PilM (dark pink); fimbrial assembly protein (cream); and pilus assembly protein PilO (magenta). Caldicellulosiruptor ta pirins are characterized as two classes of proteins by amino acid identity. Colors of the ta pirin genes indicate whether the families are from strongly cellulolytic (blue and yellow), moderately to weakly cellulolytic (green), or weakly cellulolytic (orange and purple) Caldicellulosiruptor species. Members of each color-differentiated ta pirin class (1 or 2) are categorized by 80% or more amino acid sequence identity over more than 60% of the query protein (see supplemental Table 1). B, phylogenetic tree was built using amino acid sequences from all sequences identified in Caldicellulosiruptor genomes through homology or position in relation to the type IV pilus operon. MEGA (version 6.0) was used to align amino acid sequences and build a neighbor-joining phylogenetic tree. Branches are colored to correspond with the groups of ta pirin genes noted in A. Species abbreviations follow gene locus tags when possible: Acece, Acetivibrio cellulolyticus CD2; A3M9, Caloramator sp. ALD01; H557, C. acetigenus; Athe, C. bescii; Calhy, C. hydrothermalis; Calkr, C. kristjanssonii; Calkro, C. kronotskyensis; Calla, C. lactoaceticus; COB47, C. obsidiansis; Calow, C. owensensis; Csac, C. saccharolyticus; N908, Caldicellulosiruptor sp. Rt8.B8; N913, Caldicellulosiruptor sp. Wai35.B1; Cst_c, Clostridium stercorarium subsp. stercorarium; Clo1100, Clostridium sp. BNL1100; N907, Thermoanaerobacter cellulolyticus. each protein bound to filter paper with an association constant (K a ) of ϳ0.7 M Ϫ1 , with more total protein binding to filter paper from ta pirin class 2. Filter paper appears to be the better binding substrate when both measures of affinity (Figs. 5, B and C, and 6) are considered. In contrast, the ta pirin from Calkro_0845 exhibited a higher affinity for Avicel (K a of 0.94 M Ϫ1 ) than that of Calkro_0844 (K a of 0.05 M Ϫ1 ), although more total protein from Calkro_0844 bound to Avicel. These association constants are within the range (M Ϫ1 ) previously reported for cellulose-binding proteins associated with Avicel or filter paper, including Trichoderma reesei CbhI Because there is no sequence similarity between these ta pirins and other cellulosebinding proteins previously reported, it appears that the genus Caldicellulosiruptor has developed a unique mechanism through which cells may attach to cellulose.
Three-dimensional Crystal Structure of Calkro_0844 -To determine how the structure of the ta pirins influenced their interaction with cellulose, a crystal structure was solved for a truncated version of the protein encoded by Calkro_0844. Previous attempts to crystallize the full-length ta pirin without the transmembrane domains were unsuccessful because of protein lysis resulting in unpredictable, nonreproducible, and very slow (10 months and longer) crystal growth. Upon analysis of preliminary x-ray diffraction data from these crystals, it was determined that the asymmetric unit of the crystal cell was too small for the full-length protein. This finding indicated that only a protein fragment could have been crystallized, so a limited proteolysis approach was used. This strategy for obtaining crystals for recalcitrant proteins using proteolysis has been previously described (65,66). In this case, in situ proteolysis proved unsuccessful. However, digestion of the recombinant Calkro_0844 consistently yielded a fragment with the same molecular mass. Therefore, selenomethionine-labeled Calkro_0844 was digested, thus creating discrete domains that would crystallize. Specifically, treatment with low concentrations of various pro-  teases, and specifically thermolysin, resulted in a protein fragment with an estimated molecular mass of 45 kDa that crystallized successfully after purification.
The structure of digested Calkro_0844 C-terminal domain (Calkro_0844_C) was refined to a resolution of 1.7 Å with an R and R free of 0.150 and 0.194, respectively (Table 1 and Fig. 7). There is one molecule in the asymmetric unit (Fig. 7A), and it contains one magnesium ion coordinated by main chain carbonyls of Asp-473, Val-475, Ser-477, and Leu-480, as well as two side-chain oxygen atoms of Asp-473 and Glu-509. This is in contrast to some cellulose-binding CBMs (3) or polysaccharide lyases (67) that complex calcium ions. The core of Calkro_ 0844_C is a ␤-helix comprised of 11 complete turns in total, plus a few extra ␤-strands. The longest ␤-helix contains 14 strands. Furthermore, the ends of the ␤-helix are capped with ␣-helices (␣1 and ␣3; Fig. 7A), and the turns of the ␤-helix are not consecutive. It appears that turns from 3 to 11 are formed at the N terminus, and turns 1 and 2 are formed closer to the C terminus of the construct (Fig. 7A).
Structural Comparison-Calkro_0844_C is a truly unique structure. A sequence search for known structures from the PDB found no similar entries. Pairwise secondary-structure matching of structures with at least 70% secondary structure similarity by PDBfold (68) found only partial matches to structures with similar folds from the PDB, such as pectate lyases. However, these matches were similar to only a portion of the ␤-helix core of Calkro_0844_C. Recently, a smaller (ϳ30 kDa) ␤-helix protein from Clostridium thermocellum was identified as possessing polysaccharide binding abilities; however, it also shares no appreciable similarity to the structure of Calkro_0844_C (69).
Calkro_0844_C has a long loop (residues Ser-443 to Ala-472) connecting the ends of the ␤-helix between strands ␤25 and ␤26 (Fig. 7, A and C). This is a remarkable feature of the protein structure that might play a role in cellulose binding. First, this rigid loop may hold the uniquely folded ␤-helix domains together, increasing protein stability in the harsh environment. Another hypothesis would have the loop contributing to the cellulose-binding properties of the protein, possibly directly interacting with cellulose. Co-crystallization of Calkro_0844_C with a mixture of cellobiose, cellotetraose, and cellohexaose, as well as extensive soaking of the existing crystals in the same oligosaccharides mixture, did not reveal soluble oligosaccharides binding in the crystal structure. Binding of the C-terminal fragment to insoluble cellulose and biomass was tested (Fig.  5D), and Calkro_0844_C was found to bind to both cellulose and DAP biomass. This suggests that binding of the Calkro_0844_C is specific for longer insoluble chains of cello-oligosaccharides.
The cross-section of the right-handed ␤-helix core is triangular in shape (Fig. 7B) with three relatively flat faces. Two of the faces are exposed to the solvent and are almost exclusively dominated by hydrophilic residues. The third face is mainly protected by the long loop and partially by the N and C termini. The interface between the long loop and the rest of the protein is lined up with hydrophobic residues on both sides. Curiously, there are only five hydrogen bonds between the 30-residue loop and the surface of the ␤-helix. Multiple alternative conformations of the residues in that range, including main chain atoms, further emphasize the flexible nature of this peptide loop.  Four ␣-helices are marked as well as first and last residues of the protective loop. B, cartoon representation rotated 90°to illustrate the triangular shape of the ␤-helix core as well as two exposed and one protected surfaces. C, view from the top onto hydrophobic surface of the ␤-helix core (semi-transparent surface representation, CPK colors), protective loop (semi-transparent surface, cyan), and N and C termini (cartoon, blue and red, respectively). The first and last residues of the protective loop are marked. D, view from the top onto hydrophobic surface of the ␤-helix core with protective loop, N and C termini removed. Exposed aromatic residues are highlighted in green and are labeled.
Movements of this peptide loop would expose the hydrophobic surface of the ␤-helix to bind to cellulose (Fig. 7, C and D).
Along this region, there is a line of seven aromatic side chains (tyrosines and phenylalanines, Fig. 7D) over nine ␤-strands flat on the surface and spaced ϳ5-10 Å apart. That arrangement, possibly together with the hydrophobic residues of the loop Ser-443-Ala-472, could act as an effective binding platform for cellodextrins similar to the flat binding faces of some carbohydrate-binding modules (3,61). When all ta pirin amino acid sequences were aligned against each other, aromatic residues at these seven positions are conserved across both classes of ta pirins from strongly cellulolytic members of the genus Caldicellulosiruptor. The only exception is Tyr-418, which is conserved only among class 1 ta pirins. In class 2 ta pirins, that tyrosine residue is shifted seven amino acids toward the C terminus (data not shown). Moreover, it is possible that these aromatic residues interact with cellulose, after the protective loop 443-472 shifts away and exposes the predicted cellulose-binding platform. Such a structural mechanism may be needed for solubility or to prevent nonspecific binding. Although binding assays clearly show that Calkro_0844_C is able to bind insoluble cellulose (Fig. 5D), it should be noted that the N-terminal domain is not present in the crystal structure. It would be in close proximity to the loop area and may have an effect on the loop conformations. The ta pirins described here thus represent a new class of cellulose-binding proteins, and further efforts are warranted to fully understand their unique binding mechanisms at the molecular level.

DISCUSSION
Previous studies from various cellulolytic bacteria have identified proteins that are theorized to aid in attachment to cellulose (11,12,15), indicating that there exist different mechanisms that bacteria may use to maintain proximity to their substrate. Cellulolytic Clostridia have been described as using large polypeptide structures, coined cellulosomes, to facilitate attachment and hydrolysis of plant biomass (22,70). Highly cellulolytic members from the genus Caldicellulosiruptor have previously been demonstrated to adhere to plant biomass (28 -30, 52), now including C. kronotskyensis (Fig. 1). The genus Caldicellulosiruptor, while related to cellulosomal bacteria, does not encode for cellulosomes, indicating that these species must use an alternative mechanism to attach to biomass. Extracellular S-layer-associated GHs and substrate-binding proteins from ATP-binding cassette transporters have previously been theorized to participate in mediating cellular adherence to plant biomass (30 -32); however, many of these proteins are also present in the Caldicellulosiruptor core genome and are not exclusively used by highly cellulolytic members of the genus (24). Here, a combination of comparative genomics, proteomics, functional characterization, and x-ray crystallography was employed to further understand attachment mechanisms used by highly cellulolytic Caldicellulosiruptor species.
Comparative genomics analysis of the genus Caldicellulosiruptor (24) had previously highlighted two hypothetical proteins downstream of a well conserved type IV pilus locus (Fig. 3A). The possibility of type IV pili being involved in attachment to cellulose is plausible, because pilin proteins from both R. albus and R. flavefaciens were demonstrated to bind to cellulose (16,71). However, the Caldicellulosiruptor ta pirins (roughly 70 kDa) are roughly three times as large as the individual pilin subunits from Ruminococcus species (roughly 20 -25 kDa). Furthermore, no protein domain signatures were detected, including pilin signatures, making it unlikely that these proteins are pilin subunits. In the absence of typical pilin processing signals, it is also unlikely that these proteins are incorporated into type IV pili. In addition, these genes (ta pirins) encode for proteins that are not homologous to any other protein outside of the genus Caldicellulosiruptor. As such, the ta pirins cannot be classified with other known cellulose-binding proteins, such as CBMs or substrate-binding proteins from ATP-binding cassette transporters.
Using proteomics screening, these proteins were found to be highly enriched on the Avicel-bound protein fraction and are hypothesized to be involved in maintaining cell-surface contact with plant biomass. Previous proteomics data also had detected the presence of S-layer-associated GHs and substrate-binding proteins bound to cellulose (24), implicating the ta pirins and other cell surface-associated proteins in maintaining attachment to insoluble carbohydrates. Only the strongly cellulolytic Caldicellulosiruptor species were identified as deploying both classes of ta pirins, and those proteins were enriched on Avicel (Fig. 2). However, other hypothetical proteins downstream of the type IV locus in weakly cellulolytic Caldicellulosiruptor species were expressed upon growth on Avicel, enriched in the culture supernatant, and potentially are mediating cell-to-cell adherence in a community of both strongly cellulolytic and predominantly xylanolytic members. Based on the detection of signal peptides and transmembrane domains, the ta pirins are likely displayed on the cell membrane. The exact subcellular location of these proteins, however, remains to be determined and is the subject of ongoing experiments.
This study further confirmed and quantified the cellulosebinding function of the two orthologous classes of ta pirins. Based on yeast surface display (Fig. 4, G and H), the ta pirins can mediate attachment to cellulose without any association with other bacterial protein structures, such as the type IV pilus. In this case, S. cerevisiae is an attractive host for surface display of cellulose-binding proteins, because the species does not naturally adhere to cellulose. The affinity for cellulose appears to be relatively specific, as binding assays that included insoluble xylan failed to detect any of the ta pirins bound to this polysaccharide (Fig. 5). Using adsorption isotherms, the association constant for representatives from both classes of ta pirins could be estimated (Fig. 6). Both ta pirins bound to various forms of cellulose with affinities similar to those reported for CBMs (60,62,63,72), fungal swollenin (64), and bacterial expansin (61). Taken together, the ta pirin proteins bind to cellulose with high affinity over a range of temperatures from 25°C (trans vivo) to 70°C (in vitro), confirming their hypothesized function as a new class of cellulose-binding proteins.
Based on amino acid sequence homology, the ta pirins are unique proteins currently only found in the genus Caldicellulosiruptor. Because there are no previously defined functional protein domain signatures in the ta pirin proteins, structural analysis from one class was completed. Here, we provide struc-tural analysis of a truncated peptide derived from Calkro_0844 ("Calkro_0844_C"), a representative member of the ta pirins (Fig. 7). The solved structure of Calkro_0844_C indicates that it is a right-handed ␤-helix comprised of 11 turns that are held together by a long loop, which shields the hydrophobic face of the helix. Because of the unique features of these proteins, structural homology to other classes of proteins could not be assigned, indicating that this is truly a new class of biomolecules. Although no structural similarity could be assigned, the overall function of ␤-helix-containing proteins in cellular adhesion to glycoproteins has been described for Gram-negative pathogenic bacteria. Some examples of ␤-helix-containing proteins involved in attachment include larger adhesins from Bordetella pertussis (73,74) and Haemophilus influenzae (75). In contrast however, the ta pirins are formed by nonpathogenic Gram-positive bacteria and are not known to be part of a type V secretion system. Overall, the ta pirins are significantly larger than currently known polysaccharide-binding modules, such as CBMs (3) or the pectate lyase-like protein from C. thermocellum (69), and establish a new paradigm by which lignocellulosedegrading microbes can attach to plant biomass. The possible exploitation of these novel proteins, or ta pirins, to improve plant biomass degradation for biofuels production is being pursued.