Crystal Structures of Clostridium thermocellum Xyloglucanase, XGH74A, Reveal the Structural Basis for Xyloglucan Recognition and Degradation*

The enzymatic degradation of the plant cell wall is central both to the natural carbon cycle and, increasingly, to environmentally friendly routes to biomass conversion, including the production of biofuels. The plant cell wall is a complex composite of cellulose microfibrils embedded in diverse polysaccharides collectively termed hemicelluloses. Xyloglucan is one such polysaccharide whose hydrolysis is catalyzed by diverse xyloglucanases. Here we present the structure of the Clostridium thermocellum xyloglucanase Xgh74A in both apo and ligand-complexed forms. The structures, in combination with mutagenesis data on the catalytic residues and the kinetics and specificity of xyloglucan hydrolysis reveal a complex subsite specificity accommodating seventeen monosaccharide moieties of the multibranched substrate in an open substrate binding terrain.

Plant cell wall polysaccharides are the most abundant carbohydrate polymers in nature and constitute an important renewable natural source of energy available for conversion to biofuels (1). The plant resource is, however, difficult to exploit primarily because its components are extremely resistant to degradation; plant cell wall polysaccharides are often present as insoluble, often cross-linked, structures in which cellulose is the most abundant component. In the cell wall of flowering plants, cellulose is cross-linked by two major types of glycan: xyloglucans and glucuronoarabinoxylans. The xyloglucans form a complex network of hydrogen-bonded interactions with cellulose microfibrils that confers rigidity and extensibility to the walls of all dicotyledons and about one-half of monocotyle-dons (2). The xyloglucan polysaccharide consists of a linear chain of ␤-1,4 D-glucan regularly substituted with ␣-1,6 D-xylosyl units, which is, in a species-dependent manner, further derivatized with ␣-L-arabinose or ␤-D-galactose (2, 3), Fig. 1. In primary cell wall xyloglucans, the first galactose moiety in the oligosaccharide repeat is commonly substituted with ␣-1,2 L-fucose (4). Considerable interest in the structure, biosynthesis, and enzymatic modification of xyloglucans has been sustained because of the important role these polysaccharides play in plant cell wall morphogenesis (4 -8), as well as the emerging technical applications of xyloglucans in food products, pharmaceutical delivery (9,10), cellulose fiber modification (11)(12)(13), and biofuel production (1).
Microorganisms have evolved sophisticated mechanisms to degrade plant cell wall polysaccharides and consequently exploit this rich carbon and energy source. Aerobic bacteria and fungi secrete several individual enzymes that synergistically degrade plant cell walls (14). Some anaerobic microorganisms, notably Clostridia, utilize a large multi-enzymatic complex called the cellulosome (15). The cellulosome displays a consortium of hydrolytic plant cell wall degrading enzymes, which may change with time, including cellulases, hemicellulases, pectinases, and various esterases. The Clostridium thermocellum (Ct) 4 cellulosome is one of the best studied cellulosome systems. This cellulosome is a multiprotein complex of about 3 MDa and displays endoglucanase, cellobiohydrolase (exoglucanase), xylanase, chitinase, and ␤-glucanase (lichenase) enzymatic activity (16). Cellulosome enzymes are tethered to the scaffolding protein of the complex through the interaction of dockerin domains with one of the nine cohesin platforms of the scaffold (17), Fig. 2A.
There is particular interest in the exploitation of the cellulosome from C. thermocellum, primarily because of the potential it offers for the degradation of lignocellulosic waste and subsequent generation of ethanol (reviewed in Ref. 1). To date, 71 open reading frames have been identified as cellulosomal com-ponents in Ct and about 23 genes can be ascribed a direct role in cellulose hydrolysis (18). About half of the other proteins in the cellulosome are described as hemicellulolytic enzymes, highlighting the importance of these accessory enzymes in the processing of cellulosic composites. Recently, two major enzymes implicated in hemicellulose degradation by the Ct cellulosome have been characterized; an endo-␤-xylanase and a xyloglucanase (19). Xyloglucanase Xgh74A, Fig. 2B, is the first xyloglucanase identified in Ct and the first active xyloglucanase in a cellulosome and likely plays an important role in the degradation of dicot plant cell wall polysaccharides.
Many xyloglucanases have been classified into glycosyl hydrolase family 74 (hereafter GH74) in the Carbohydrate Active Enzyme (CAZy) classification (20); recently reviewed in Ref. 21. From the structural point of view, the only member of the GH74 family published to date is the reducing end-specific cellobiohydrolase (OXG-RCBH) from Geotrichum sp. M128 (22). OXG-RCBH recognizes the reducing end of various xyloglucan-derived oligosaccharides and releases two glucosyl residues of the type GG, XG, or LG (nomenclature according to Ref. 23, see also Fig. 1) suggesting the presence of at least four subsites (24). Additionally, it was noted that the glucosyl residue at position ϩ2 (nomenclature according to Ref. 25) has to be unsubstituted, while that at Ϫ1 should preferentially posses a xylosyl substituent. The OXG-RCBH structure consists of a tandem repeat of two seven-bladed ␤-propeller motifs with the catalytic center formed by the interface of these two domains.
The recent enzymatic characterization of the endoxyloglucanase, Xgh74A, shows that the enzyme hydrolyzes the glycosidic bond of the unbranched glucosyl residues in xyloglucan, to yield XXXG, XLXG (or XXLG), and XLLG oligosaccharides (19). Here we describe the crystal structure of Xgh74A both as an uncomplexed apoenzyme at 2.1 Å resolution, and that of an inactive variant, Xgh74A-D70A, with a single molecule of XLLG and one of XXLG bound in the active site cleft. These structures, in light of kinetic and hydrolysis data, reveal the specificity determinants responsible for xyloglucan recognition and provide insight into the hydrolysis of this important plant cell wall polysaccharide.

EXPERIMENTAL PROCEDURES Cloning of Ct Xgh74A Catalytic Domain
The cellulosomal xyloglucanase A from C. thermocellum is a bi-modular enzyme containing an N-terminal family 74 glycoside hydrolase (GH) catalytic domain followed by C-terminal dockerin (19). To express the xyloglucanase catalytic module in Escherichia coli, Xgh74A hereafter, the DNA fragment encoding the protein domain was amplified by PCR from C. thermocellum YS genomic DNA with the thermostable DNA polymerase Pfu Turbo (Stratagene). The primers, 5Ј-CTCGCTAGCATTTC-CAGCCAGGCTGTA-3Ј and 5Ј-CACCTCGAGATCT-GAAGCAGGTTCGCC-3Ј, incorporated NheI and XhoI restriction sites, which are depicted in bold. The amplified product was ligated into pMOSBlue (Amersham Biosciences) and sequenced to ensure that no mutations had occurred during the polymerase chain reaction. The recombinant pMOSBlue derivative was digested with NheI and XhoI, and the excised Xgh74A encoding gene was cloned into the similarly restricted expression vector pET21a to generate pCG1. CtXgh74A encoded by pCG1 contains a C-terminal His 6 -tag.

Site-directed Mutagenesis
Mutants of Xgh74A were generated using a PCR-based QuikChange site-directed mutagenesis kit (Stratagene) according to the manufacturer's instructions and using pCG1 as the template DNA. The sequences of the primers used to generate the protein mutants were as follows: 5Ј-GATTTATGCACGTGCCGCT-ATCGGAGGAGCGTACC-3Ј and 5Ј-GGTACGCTCCTC-CGATAGCGGCACGTGCATAAATC-3Ј, D70A; 5Ј-CTTGTA-AGTGCAGTTGGGGCCCTTGTCGGTTTTGTTC-3Ј and 5Ј-GAACAAAACCGACAAGGGCCCCAACTGCACTTACAAG-3Ј, D480A. The mutated DNA sequences were sequenced to ensure that only the appropriate mutations had been incorporated into the nucleic acid.

Production of Recombinant Xgh74A and Mutants
E. coli Tuner cells (Novagen) harboring the pET21a-Xgh74A plasmid were cultured in LB containing ampicillin to mid-exponential phase (A 600 0.6) at which point, cultures were transferred to 20°C and induced by addition of 1 mM isopropyl 1-thio-␤-D-galactopyranoside (IPTG) whereupon they were grown for a further 20 h. SeMet-labeled Xgh74A was produced in E. coli B834 (DE3) containing the pET21a-Xgh74A plasmid with recombinant protein expression FIGURE 1. The general structure of xyloglucan. Endoglucanases, including Xgh74, typically cleave the glycosidic bond of the unbranched Glc residue (arrow) to yield xylogluco-oligosaccharides. Common structures in seed xyloglucans (e.g. Tamarindus indica) include XXXG (x ϭ 0, y ϭ 0, R ϭ H), XLXG (x ϭ 1, y ϭ 0, R ϭ H), XXLG (x ϭ 0, y ϭ 1, R ϭ H), and XLLG (x ϭ 1, y ϭ 1, R ϭ H). XXFG (x ϭ 0, y ϭ 1, R ϭ ␣-1,2 L-Fuc) is found in xyloglucans from dicot primary cell walls. Oligosaccharide nomenclature is according to Ref. 23. induced by 1 mM IPTG and incubation at 20°C for 20 h. Cells were harvested by centrifugation and disrupted by sonication in 20 mM HEPES-NaOH, 400 mM NaCl, pH 7.5 buffer. The cell-free extract was incubated at 65°C for 15 min and centrifuged to remove insoluble material. Samples were further purified by Ni 2ϩ affinity chromatography and buffer exchanged to 10 mM HEPES-NaOH pH 7.5. Xgh74A samples thus purified were assessed pure by SDS-PAGE and were used for crystallization experiments. Xgh74A-D70A and D480A mutants were produced and purified following the native Xgh74A expression and purification protocols.

High Performance Anion-Exchange Chromatography with Pulsed Amperometric Detection (HPAEC-PAD)
Oligosaccharides were analyzed on a Waters HPLC system with a Dionex Carbopac PA100 column. A Waters Concorde electrochemical detector was used in PAD mode with a 3-mm gold electrode and a HyREF platinum reference electrode. Two optimized gradients were used for different sizes of xylogluco-oligosaccharides.

Mass Spectrometry of Xylogluco-oligosaccharides
Mass spectrometric analysis was performed with a Q-Tof TM 2 mass spectrometer fitted with a nanoflow ion source (Waters Corporation, Micromass MS Technologies, Manchester, United Kingdom). External calibration of the TOF analyzer (single-reflectron mode, resolution Ͼ10000 FWHM) was obtained over the m/z range 50 -1000 using a solution of NaI (1.5 g/liter) in 1:1 2-propyl alcohol/water. Solutions of xylogluco-oligosaccharides (typical concentration 0.01-0.1 g/liter in 1:1 MeOH/water containing 0.5 mM NaCl) were infused into the ion source (3 kV) at 200 nl/min (syringe pump). The cone voltage was varied between at 35 V and 130 V to optimize the intensity of [MϩNa] ϩ and [Mϩ2Na] 2ϩ ions. Argon was present in the collision cell at all times, and the collision energy was 10 V. A scan time of 2.5 s with an interscan delay of 0.1 s was used, and continuum data were collected until an acceptable signal-to-noise ratio was achieved after the combination of individual spectra (typically 1-30 spectra).

Preparation of Xylogluco-oligosaccharides (XGOs) from Deoiled Tamarind Kernel Powder
Mixture of Xylogluco-oligosaccharides Based on a Glc 4 Backbone (XLLG, XLXG, XXLG, XXXG)-Deoiled tamarind kernel powder (Saiguru Food Gum Manufacturer, Mumbai, India) (20 g) was suspended in ammonium acetate buffer (1 liter; 10 mM; pH 4.5) at 60°C and was vigorously stirred until homogenous. Then the suspension was cooled to 30°C and crude cellulase from Trichoderma reesi (Fluka) was added (100 mg, 500 units). The resulting solution was incubated at 30°C during 18 h under gentle stirring. The progress of the digestion was monitored by HPAEC-PAD (gradient B). The solution was filtered on a glass fiber filter, 1 ml of NH 3 (37% in H 2 O) was added, and the basic solution was pumped over a Q-Sepharose (GE Healthcare) column (10 cm high, 2.6 cm diameter) to remove the cellulase. The resulting solution was freeze-dried to yield a mixture of XGOs as a white powder (typical yield 8 -9 g). The following oligosaccharide composition was obtained by HPAEC-PAD (gradient A): XXXG, XLXG, XXLG, XLLG (2:1:3:3).
Mixture of Higher Order Xyloglucan Oligosaccharides (Glc 8 -Glc 16 Backbone)-A modified protocol based on that described by Vincken et al. (26) was used. One gram of deoiled tamarind kernel powder was dissolved in ammonium acetate buffer (50 ml; 10 mM; pH 4.5) at 60°C for 1 h. The solution was cooled to 30°C and 1 mg (1 unit) of crude cellulase (T. reseei, Fluka) was added. The progress of the digestion was monitored by HPAEC-PAD (gradient B). When HPAEC-PAD analysis indicated the presence of predominantly Glc 4 and Glc 8 oligosaccharides, the digestion was stopped (16 h) by boiling for 30 min. The solution was cooled to room temperature, filtered on a glass fiber filter, and the filtrate concentrated in vacuo to a volume of 10 ml. The oligosaccharides were separated by size exclusion chromatography on two Bio-Gel P6 (Bio-Rad) columns (2 ϫ 90 cm, 2.6 cm diameter) connected in series and maintained at 60°C. The products were eluted with a flow rate of 0.5 ml/min with ultrapure water. Fractions (5 ml) were analyzed by HPAEC-PAD (gradient B), pooled; concentrated in vacuo, and finally freeze-dried (typical yield 130 mg of Glc 4 oligosaccharides, 190 mg of Glc 8 oligosaccharides, 98 mg of Glc 12 oligosaccharides, 20 mg of Glc 16 oligosaccharides and 8 mg of Glc 20 oligosaccharides).
XXXGXXXG and Partially Degalactosylated Glc 12 -based Xyloglucan Oligosaccharides-Degalactosylated higher order oligosaccharides were produced exactly as described in the preceding paragraph, except that 4 units of ␤-galactosidase (Aspergillus niger, Megazyme, Eire) were added immediately after boiling and cooling the crude cellulase digestion mixture. The degalactosylation reaction was carefully controlled by HPAEC-PAD analysis (gradient B) because of contaminating isoprimeverase activity in the commercial ␤-galactosidase (27).
Although SeMet data were collected, Xgh74A structure was solved by molecular replacement because the structure of the homologous Geotrichum sp. M128 oligoxyloglucan reducing end-specific cellobiohydrolase (PDB ID: 1SQJ) became available at the time these studies were carried out. The program PHASER (29) was used to place the 4 molecules of Xgh74A in the asymmetric unit. The Xgh74A model was built and refined, against the SeMet data using Arp-wARP (28). Subsequently, the refined SeMet apo enzyme structure was used as a molecular replacement model for the solution of the Xgh74A-D70A ligand complex, which crystallized in a P4 3 2 1 2 form with 2 molecules in the asymmetric unit. Manual rebuilding and ligand placement were carried out with COOT (30). Solvent model was build with Arp-wARP with maximum likelihood refinement using REFMAC (31).

Activity of Xgh74A on Xyloglucan
Kinetics of Xgh74A-Reducing sugars released from the hydrolysis of plant cell wall polysaccharides by Xgh74A were quantified following the method described by Nelson (33) and Somogyi (32). Enzyme activities were measured in 25 mM potassium phosphate buffer, pH 7.0, at 50°C. The assay mixture (100 l) contained 10 l of enzyme (with appropriate dilution) and concentrations of substrate ranging from 0.05 to 15 g/liter. Activities were determined in triplicate in the linear range of the reactions.
Limit Digest of Tamarind Xyloglucan and Glc 4 -, Glc 8 -, and Glc 12 -based Xylogluco-oligosaccharides by Xgh74A-Tamarind xyloglucan (1 g/liter) and Xgh74A (0.02 mg/liter) was incubated in 20 mM potassium phosphate buffer pH 7.0 (total volume 25 l) at 50°C for 30 min. The reaction was stopped by incubation at 95°C for 10 min. 10 l of the sample was analyzed by HPAEC-PAD (gradients A and B).
For each of the above cases, comparative experiments were carried out by substituting Trichoderma longibrachiatum endoglucanase (0.25 units, EGII, lot 50201, Megazyme, Erie,) for Xgh74A under identical conditions.

RESULTS AND DISCUSSION
The Ct ␤-1,4-xyloglucan hydrolase Xgh74A is an 842-residue protein that consists of a N-terminal catalytic module (residues 1-776) and a C-terminal dockerin module (residues 777-842), Fig. 2B. For structural and kinetic studies, the Xgh74A catalytic module was cloned into the pET21a vector from residue Ile 28 to Glu 762 and overexpressed in E. coli Tuner cells (Novagen).
Kinetics and Specificity of Xgh74A-Xgh74A was active on tamarind xyloglucan, lichenan, and the artificial substrates carboxymethyl cellulose (CMC 4M, Megazyme, Eire) and hydroxyethylcellulose, Table 1. Xgh74A shows considerably greater catalytic efficiency on xyloglucan (k cat /K m 63 (g/liter) Ϫ1 min Ϫ1 ) than on any of the other substrates including CMC 4M  (k cat /K m 9 (g/liter) Ϫ1 min Ϫ1 ), which like xyloglucan is substituted at the 6 position, in this case with an average degree of substitution of 4 carboxymethyl groups every 10 sugars. The enzyme showed no activity on xylan, polygalacturonic acid, wheat arabinoxylan, rhamnogalacturan, curdlan, laminarin, galactomannan, galactan, and arabinan with a small unquantifiable activity on glucomannan. Taken together the results are strongly suggestive that Xgh74A is a true xyloglucanase. Limit digest analysis of tamarind xyloglucan hydrolysis by Xgh74A indicates the liberation of XXXG, XLXG, XXLG, and XLLG (Fig. 3) as the major products. Longer oligosaccharides did not accumulate, but were observed as transient species by HPAEC-PAD. In contrast to previous work on this enzyme (19), we saw no evidence for the production of XXG, XXX, or XXGG by mass spectrometry (Fig. 4). We speculate that the acidic ESI conditions and/or different ion optics employed by Zverlov et al. (19) (Fig.  4).
To investigate whether Xgh74A cleaves tamarind xyloglucan at a position other than the anomeric carbon of the unbranched glucosyl moiety (Fig. 1), limit digest experiments were performed on xylogluco-oligosaccharides based on Glc 4 -, Glc 8 -, and Glc 12 backbones. Cleavage of the ␤ (1-4) glycosidic bond between two branched Glc units bearing ␣ (1-6) Xyl residues has been previously reported in the related GH74 oligoxyloglucan reducing end-specific cellobiohydrolase (OXG-RCBH) from Geotrichum sp (24). Incubation of a high concentration of Xgh74A with a 2:1: 3:3 mixture of XXXG/XLXG/ XXLG/XLLG showed no formation of smaller oligosaccharides by HPAEC-PAD (data not shown) demonstrating that Xgh74A, in contrast to OXG-RCBH, cannot cut at substituted glucosyl moieties, at least on these short substrates. Similarly, incubation of Xgh74A with XXXGXXXG or a variably galactosylated Glc 8 XGO mixture, yielded only XXXG or the expected XXXG/XLXG/XXLG/XLLG mixture, respectively, as determined by HPAEC-PAD (data not shown). Likewise, Xgh74A digestion of a partially degalactosylated Glc 12 XGO mixture yielded predominantly XXXG and minor amounts of other Glc 4 -based XGOs (Fig. 3). In all cases the action of Xgh74A, as determined by HPAEC-PAD and/or MS analysis, was indistinguishable from that of commercially available T. longibrachiatum endoglucanase (Fig. 3).
Structure of C. thermocellum Xyloglucanase Xgh74A-The structure of the catalytic module of Xgh74A was solved in a P2 1 crystal form by molecular replacement using the homologous Geotrichum sp. M128 oligoxyloglucan reducing end-specific cellobiohydrolase (OXG-RCBH, PDB ID: 1SQJ) as a search model. The final Xgh74A model, an apoenzyme incorporating SeMet in place of methionine, comprises residues Val 33 to Ser 760 and was refined to crystallographic R-factors of 17.5% (R cryst ) and 21.0% (R free ) with diffraction data to a resolution of 2.1 Å. Data collection and refinement statistics are summarized in Table 2. The asymmetric unit contains 4 copies of the polypeptide chain that can be superimposed with an average root-mean-square deviation (RMSD) of 0.25 Å showing no significant conformational differences due to crystal packing. Some of the contacts between molecules in the asymmetric unit are mediated through the coordination of cadmium ions (added as a crystallization component).
Xgh74A consists of two sevenbladed ␤-propeller domains (Fig.  5A), as expected from the sequence homology with OXG-RCBH. The N-terminal domain of Xgh74A comprises residues 63 to 459 while the C-terminal domain involves residues 33-62 and 460 -760. Similarly to OXG-RCBH, the Xgh74A N-terminal domain is orientated at angle of ϳ90 degrees relative to the C-terminal domain and interactions between these domains occur primarily through H-bonding and hydrophobic interactions over a shared contact area of about 7530 Å 2 . The N-and C-terminal domains, which can be superimposed with an RMSD of 3.1Å, exhibit 19% sequence identity, which most likely reflects an ancient gene duplication event. The Xgh74A N-and C-terminal domains are connected by two loop segments, one located in the N terminus and the other in the middle of the sequence. In the OXG-RCBH structure, the Nand C-terminal domains are linked by three segments; one in the N terminus, the second in the middle and the third in the C terminus of the sequence (this C-terminal segment adds a fifth strand to the second blade of the N-terminal propeller, which is absent in Xgh74A).
The overall topology of Xgh74A is thus very similar to OXG-RCBH with all the secondary structure elements linked through identical connectivity. The C␣ traces superimpose with a RMSD of 1.8 Å for 664 equivalent residues, reflecting 39% sequence identity (calculation performed with DALI, Ref. 34). Not surprisingly, greater structural divergence is found in the loops connecting the blades of the ␤-propeller architecture. The most important of these differences is the different conformation adopted by the loop Thr 397 -Pro 406 in Xgh74A compared with its structural equivalent Asn 374 -Thr 391 in OXG-RCBH, which may contribute to the significantly different substrate specificity of the two enzymes. These structural details are discussed, below, in the light of the ligand complexes of Xgh74A.
Active Site Structure-It is immediately apparent (Fig. 5) that the substrate binding region of Xgh74A lies in an open cleft. This groove is formed at the intersection of the N-and C-terminal domains. The surface of this cleft is formed by the loops connecting the ␤-propeller blades in both domains. In the apo Xgh74A structure some of these loops (Tyr 206 -Asp 217 , Thr 291 -Asn 298 , and Asp 524 -Asp 527 ) are disordered, whereas in the ligand complexed forms they become ordered and participate in substrate binding (described below).
Catalysis by family GH74 enzymes occurs with inversion of anomeric configuration; i.e. the stereochemistry of the product is inverted with respect to the ␤-linkage of the substrate. A classical interpretation of glycoside hydrolysis with inversion of anomeric configuration implicates two key residues; a catalytic acid, to facilitate leaving group departure by protonation and a catalytic base, to activate the incoming water molecule for nucleophilic attack by deprotonation (glycosidase catalytic mechanisms are reviewed in Ref. 25). In GH74 enzymes, it is believed (22) that two aspartate residues play the role of Brøn-  sted acid and base; in Xgh74A these are believed to be Asp 480 and Asp 70 , respectively. Asp 70 and Asp 480 are located in the middle of the active center cleft, lying on opposite sides, and deep within the cavity with their carboxylate groups ϳ10 Å apart. Site-directed mutagenesis of either of these residues, to alanine, results in an inactive enzyme (which within the sensitivity of the assay suggests at least 1000 -5000 times less activity that wild-type enzyme). Asp 70 is located in the middle of the loop connecting the second and third strand of the first propeller blade of the N-terminal domain. The peptide sequence in this region is strictly conserved among the members of the GH74 family. In the apoenzyme, Asp 70 forms a hydrogen bonding interaction with the side chain of Glu 459 and two molecules of water. Asp 480 is located in the C-terminal domain in an equivalent position to Asp 70 . However, in contrast to Asp 70 , Asp 480 does not form H-bonds with protein atoms and points directly into the cleft. It is not immediately apparent what might contribute to an elevated pK a for this catalytic acid in the absence of bound substrate.
Structure of the Xgh74A-D70A Mutant in Complex with Glc 4based Xylogluco-oligosaccharides-To probe the structural determinants of xyloglucan recognition we attempted to cocrystallize Xgh74A and two inactive Xgh74A variants (D70A and D480A) with preparations of xylogluco-oligosaccharides based on Glc 4 backbones. Crystals of the Xgh74A-D70A mutant, complexed with a mixture of Glc 4 -based oligosaccha-rides, were obtained by co-crystallization and diffracted to 1.95-Å resolution. Crystals were indexed in the P4 3 2 1 2 space group and contained two molecules in the asymmetric unit. The complex structure is essentially identical to the ordered parts of the apo structure resulting in an RMSD value of 0.3 Å for the C␣ atoms. In the ligand complex structure, however, it is also possible to build the previously disordered loops, which all interact directly with the bound oligosaccharide.
The electron density map displays well defined density for seventeen sugar rings, corresponding to a molecule of XLLG and another of XXLG, either side of the catalytic center (Fig. 6). The two molecules sit in an extended conformation at the bottom of the cleft on both sides of the catalytic residue Asp 480 . It is possible that the desolation afforded by ligand binding contributes to the pK a elevation of the catalytic acid. The glucosyl backbones extend about 20 Å in opposite directions from the center point described by the Asp 480 residue. The interaction of the two ligand molecules extends over an area of ϳ316 Å 2 .
Despite co-crystallization with a mixture of oligosaccharides (in relative proportions XXXG 2 : XLXG 1: XXLG 3: XLLG 3,) the species that we observe bound to the enzyme corresponds to XLLG in the "minus" subsites (subsite nomenclature according to Ref 25.) and XXLG in the "positive" leaving group subsites. Thus, four ␤-1,4 glucosyl moieties of one XLLG molecule are located in the negative binding sites Ϫ1 (Glc Ϫ1 ), Ϫ2 (Glc Ϫ2 ), Ϫ3 (Glc Ϫ3 ), and Ϫ4 (Glc Ϫ4 ) in an extended conforma- tion. Three ␣-1,6-linked xylose residues branch from the Ϫ2 (Xyl Ϫ2Ј ), Ϫ3 (Xyl Ϫ3Ј ), and Ϫ4 (Xyl Ϫ4Ј ) glucosyl units, with both Xyl Ϫ2Ј and Xyl Ϫ3Ј also bearing a ␤-1,2-linked galactosyl (Gal Ϫ2Ј' ) unit, the latter partially disordered. The mean temperature factor for this oligosaccharide is 16 Å 2 and its interaction area with the enzyme is ϳ162 Å 2 . Glc Ϫ1 is positioned in the middle of the diagonal line connecting the C␣ atoms of the catalytic residues Asp 70 and Asp 480 (Ala 70 in the complex structure). Ala 70 in the Xgh74A-D70A mutant lies below the plane of the Glc Ϫ1 ring at a distance of about 5.8Å between Glc Ϫ1 C1 and Ala 70 C␣, consistent with the position demanded for the catalytic base in an inverting mechanism. Asp 480 is located above this plane at a distance of 4.3 Å. Glc Ϫ1 forms a network of H-bonding interactions where all its oxygen atoms except O4 are involved (Figs. 6 and 7). Glc Ϫ1 O1, O2, and O3 atoms interact with the nitrogen main chain of Phe 51 , Arg 158 side chain and Asn 154 side chain respectively, whereas the putative catalytic acid, Asp 480 , forms H-bonds with Glc Ϫ1 O5 and O6. All the side chains involve in the recognition of Glc Ϫ1 appear conserved in the multiple sequence alignment of GH74 family, Fig. 8. Superposition of the Xgh74A and OXG-RCBH structures shows similar environments around the Glc Ϫ1 O6 position (Ϫ1Ј subsite). In both structures, the loops enclosing this region adopt similar conformations and some of the side chains lining the cavity are conserved, leaving room to accommodate a xylose residue at the Ϫ1Ј position. Accordingly, it is not evident what constitutes the structural basis supporting the fact that endoxyloglucanases prefer to cleave the xyloglucan chain at unbranched glucosyl positions (Ϫ1) in contrast with the ability of exo-xylo-glucanases to process substrates with a xylose ramification at the Ϫ1Ј position (24,35).
At subsite Ϫ2, the plane of the Glc Ϫ2 ring is rotated 180 degrees relative to the plane of Glc Ϫ1 contacting the enzyme through two H-bonding interactions mediated by Glc 2Ϫ O2 and O3 with the side chain of Asn 749 . Xyl Ϫ2Ј ring is located almost parallel to the plane of Tyr 214 aromatic ring and its interaction with the protein is mediated by two H-bonds between Xyl Ϫ2Ј O4 and the side chain oxygen of Tyr 295 and side chain NH1 of Arg 158 . Gal Ϫ2Ј' does not contact the protein directly but forms an H-bond interaction with a water molecule which in turn interacts with the main chain nitrogen atom of Tyr 214 . The Gal Ϫ2Ј' moiety does not appear to be specifically recognized by the enzyme; instead it is located close to the border of the cleft and exposed to the solvent.
The Ϫ3 subsite glucoside is "stacked" on the side chain of Trp 125 and does not form any hydrogen-bonding interactions with the enzyme. The Xyl Ϫ3Ј ring in Xgh74A describes an angle of ϳ90 degrees with respect to the plane of Glc Ϫ3 ring pointing directly to the middle of the cleft in the N-terminal to C-terminal direction. This residue forms two H-bonds between its O3 and O4 atoms and the side chain of Asp 731 . An additional partially occupied Gal (Gal Ϫ3Ј' ) residue was modeled in the electron density adjacent to Xyl Ϫ3Ј O2 atom. This Gal Ϫ3Ј' residue is located at the entrance of the cleft and exposed to solvent. Gal Ϫ3Ј' does not interact directly with any protein atom but forms one H-bond between its O4 and the O3 of Glc Ϫ3 . This structural feature is in agreement with the described absence of specificity toward Gal residues on positions Ϫ2 and Ϫ3. The sugar moieties at position Ϫ4, Glc Ϫ4 and Xyl Ϫ4Ј are located at the periphery of the cleft. The plane of the Glc Ϫ4 ring lies at an angle of 180 degrees with respect to the plane of Glc Ϫ3 . This glucosyl unit contacts the enzyme through an H-bond between O2 and main chain oxygen of Trp 125 . The Xyl Ϫ4Ј residue does not contact any protein atom and only makes H-bonding interactions between its O4 atom and Gal Ϫ2Ј O4 and Glc Ϫ2 O2 atoms.
In the positive or "leaving group" subsites of Xgh74A, a second XXLG molecule binds in an extended conformation in which an imaginary line drawn through the glucosyl residues at the positive subsites makes an angle of ca. 60 degrees with respect to the trajectory of the glucosyl backbone bound in the negative subsites; i.e. the chain is bent. The interaction area with the enzyme is smaller (154 Å 2 ) than the area of contact over the minus subsites. Direct H-bonding contacts between this molecule and the protein atoms are also fewer and are confined just to the ϩ1 site. The catalytic acid, Asp 480 , contacts the glucosyl unit at the ϩ1 and Ϫ1 sites forming H-bonding interactions with the O4 of Glc ϩ1 unit consistent with its role in aiding departure of this group, through proton donation, during catalysis. The ϩ1 sugar residue also interacts with the aromatic nitrogen of Trp 410 (O2) and the main chain oxygen of Gly 430 . Asp 524 forms two H-bonding interactions with O2 and O3 of the Xyl ϩ1Ј residue and additionally, the O5 atom of this residue contacts the O3 of Xyl ϩ2Ј .
The glucosyl backbone starts to emerge from the binding cleft from site ϩ2. None of the sugar residues at sites ϩ2, ϩ3, and ϩ4 makes direct H-bonding contacts with protein atoms but are found to be involved in a complex network of interactions mediated by water molecules as is represented in Fig.  6B. In the Xgh74A complex described here there is no density, not even at very low levels, indicative of a galactosyl residue attached to the ϩ2Ј xyloside. Indeed, inspection of the structure would suggest that there are steric blockages for the accommodation of a ϩ2" galactosyl moiety. The interpretation of limit digest patterns, in particular the observation of XLLG, demands that the ϩ2" region must be able to accommodate a galactosyl moiety during catalysis. One possibility is that the binding mode for XLLG is more flexible, at either protein or ligand levels, than that observed here for XXLG.

Conservation of Xyloglucan Recognition Sites in Members of GH74
Family-GH74 family groups enzymes that are able to hydrolyze xyloglucan oligosaccharides but also are active on non-branched substrates like barley ␤-glucans (␤-1,3/1,4 glucan), carboxymethyl cellulose (CMC), Avicel (microcrystalline cellulose) or galactomannan (␤-1,4-mannose). A wide spectrum of activities is observed among members of this family. This spectrum covers enzymes that can, apparently, only process xyloglucan up to enzymes that actually prefer non-branched substrates such as barley ␤-glucan. For example, Paenibacillus sp. KM21 (36) and Thermobifida fusca YZ (37) xyloglucanases are only active on xyloglucans from various sources but not on nonbranched polymers such as barley ␤-glucan, CMC, Avicel, or xylan. Geotrichum sp. M128 xyloglucanase displays its highest activity on the more branched xyloglucans from tamarind or pea, but it is less efficient on barley xyloglucan that contains fewer xylose decorations and the enzyme shows no activity on non-branched substrates (44). At the other end of the spectrum, Thermotoga maritima Cel74 shows its highest activity on barley ␤-glucan and is about 75% less efficient on tamarind xyloglucan (38).
The specificity of the interaction between xylo-oligosaccharide ligands and Xgh74A interpreted in light of direct H-bonding contacts with the enzyme is reduced to positions Ϫ3Ј, Ϫ2Ј, Ϫ2, Ϫ1, ϩ1, and ϩ1Ј (Fig. 8). Apart from the strict conservation of the catalytic residues Asp 70 and Asp 480 and their sequence equivalents in the GH74 family, the residues responsible for the recognition of the glucosyl units at positions Ϫ2, Ϫ1, and ϩ1 appear highly conserved in the multiple sequence alignment of GH74 members (Fig. 8). Two substitutions are only observed at the glucosyl recognition site Ϫ1 and ϩ1 in the multiple sequence alignment. The first one is in the equivalent position to Xgh74A Phe 51 where a Tyr residue appears in Geotrichum sp. OXG-RCBH and in Aspergillus nidulans OREX. This interaction is not mediated by the side chain but through an H-bond between the carbonyl oxygen of the main chain and the Glu Ϫ1 O1 atom. The second substitution is observed in the equivalent position of Xgh74A Trp 410 where a His residue appears in T. maritima endoglucanase. The sequence around this region in T. maritima Cel74 is also different, a Pro residue appears in the position of an otherwise strictly conserve Gly residue in a two residues shorter loop. The sequence equivalent position of Xgh74A Asn 749 responsible for the interaction with the O2 and O3 of glucosyl residue at position Ϫ2 appears also highly conserved in the GH74 alignment.
The equivalent positions of Xgh74 residues responsible for xyloglucan recognition at the prime subsites are not as well conserved at the sequence level as the positions at the Ϫ/ϩ 1 and Ϫ2 subsites. The degree of sequence variation increases from position Ϫ2Ј where some conservation among the GH74 members is observed to position Ϫ3Ј where no conservation is observed among the family GH74 members. Thus, the equivalent to Xgh74 Tyr 295 that recognizes a xylose residue at the Ϫ2Ј subsite appears conserved in most of the sequences with the exception of the endoglucanase from T. maritima where an Asn residue is found instead. The equivalent position of Xgh74 Asp 524 that contacts the xylose residue at the Ϫ1Ј subsite appears as Ile, Val, or Ala in the xyloglucanases from Jonesia sp., Thermobifida fusca and Hypocrea jecorina respectively. At the Ϫ3Ј subsite, Xgh74 Asp 731 is found not conserved in any of the family members, the loop in which this residue is located displays variations in length and overall amino acid composition making difficult to assess the presence or not of an equivalent interaction in the absence of structural data on these other family members.
Exo versus Endo Specificity in Family GH74-In both Xgh74A and OXG-RCBH the substrate binding cavities are open grooves well exposed to solvent (Fig.  9). OXG-RCBH is an exoglucanase that releases two glucosyl residues from the reducing end of the xyloglucan polymer, suggesting the presence of at least two negative reducing end subsites and two positive leaving group subsites (22,24) and demanding the ability to cleave at xylose-substituted glucosyl moieties. In contrast, Xgh74A processes the xyloglucan chain in an endo fashion releasing four glucosyl residue segments (Ref. 19 and this work).
Xgh74A residues Trp 125 and Asp 731 appear to contribute to specificity for the sugars located at subsite Ϫ3 where they interact directly with Glc Ϫ3 and Xyl Ϫ3Ј (Figs. 5 and 6). In the OXG-RCBH structure, the equivalent structural determinants for sugar recognition at the Ϫ3 site are absent. The OXG-RCBH equivalent to Trp 125 of Xgh74A is Asp 89 , which is situated in a loop two residues shorter than in Xgh74A and is consequently too distant to interact with the Glc Ϫ3 residue. Similarly, while Xgh74A Asp 731 interacts directly Xyl Ϫ3Ј O2 and O3 atoms, the structural equivalent of Xgh74 Asp 731 in OXG-RCBH is Thr 736 , but its position is again distant from Xyl Ϫ3Ј most likely as a result of conformational constraints imposed in the loop by the presence of residues Gly 734 -Pro 735 in OXG-RCBH. These features likely contribute to the reported differences between the two enzymes with respect to the number of reducing end subsites.
Differences between the two enzymes at the ϩ3 subsite ( Fig.  9) are more pronounced. The conformations of the (structurally equivalent) loops Xgh74A Thr 397 -Pro 406 and OXG-RCBH Asn 374 -Thr 391 are dramatically different; the latter closing the binding cleft immediately after the subsite ϩ2 and presumably restricting OXG-RCBH to exo-hydrolysis. The cleft in Xgh74A is thus open in both extremes and differs from OXG-RCBH in which the loop Gly 375 -His 385 blocks one-half of the substrate binding landscape.
The Xgh74A structure reveals both a complex binding architecture in which subsites accommodate seventeen distinct sugar moieties accounting for the pattern of xyloglucan recognition observed by limit hydrolysis of tamarind xyloglucan. Catalysis occurs with inversion of anomeric configuration in a mechanism in which (kinetically essential) aspartates 480 and 70 likely play the role of catalytic acid and base, respectively. Given the growing importance of plant biomass conversion, especially in the context of the demand for clean energy sources, the Xgh74A structure provides the first insights into the recognition and hydrolysis of this crucial component of the plant cell wall.