Crystal structure of a family 54 alpha-L-arabinofuranosidase reveals a novel carbohydrate-binding module that can bind arabinose.

As the first known structures of a glycoside hydrolase family 54 (GH54) enzyme, we determined the crystal structures of free and arabinose-complex forms of Aspergillus kawachii IFO4308 alpha-l-arabinofuranosidase (AkAbfB). AkAbfB comprises two domains: a catalytic domain and an arabinose-binding domain (ABD). The catalytic domain has a beta-sandwich fold similar to those of clan-B glycoside hydrolases. ABD has a beta-trefoil fold similar to that of carbohydrate-binding module (CBM) family 13. However, ABD shows a number of characteristics distinctive from those of CBM family 13, suggesting that it could be classified into a new CBM family. In the arabinose-complex structure, one of three arabinofuranose molecules is bound to the catalytic domain through many interactions. Interestingly, a disulfide bond formed between two adjacent cysteine residues recognized the arabinofuranose molecule in the active site. From the location of this arabinofuranose and the results of a mutational study, the nucleophile and acid/base residues were determined to be Glu(221) and Asp(297), respectively. The other two arabinofuranose molecules are bound to ABD. The O-1 atoms of the two arabinofuranose molecules bound at ABD are both pointed toward the solvent, indicating that these sites can both accommodate an arabinofuranose side-chain moiety linked to decorated arabinoxylans.

The plant cell wall is a composite structure consisting mainly of celluloses, hemicelluloses, and lignins. Hemicelluloses are the most abundant renewable biomass polymers next to cellulose, and they are the key components in degradation of the plant biomass. The complete degradation of cellulose and hemicellulose involves many enzymes. The glycosidic bond between two sugars is cleaved by GHs 1 such as cellulases and hemicel-lulases. GHs have been classified into more than 90 families based on amino acid sequence similarity (Carbohydrate Active enZYmes server at afmb.cnrs-mrs.fr/CAZY) (1,2). They show a wide variety of substrate specificities and great structural diversity. The degradation process is often inefficient because most polymers of celluloses and hemicelluloses are either insoluble or closely associated with the insoluble cellulose matrix. GHs frequently have a comparatively small non-catalytic carbohydrate-binding modules (CBMs) besides the catalytic domain. CBMs enhance the hydrolysis of insoluble celluloses or hemicelluloses by binding them (3). They exhibit various types of structures and modes of substrate binding. At present, they are classified into more than 30 families based on sequence and structural similarity (4).
We previously reported the cloning, purification, and crystallization of AkAbfB (13,18). Here we report the crystal structures of native AkAbfB and its complex with L-arabinofuranose. These structures revealed that AkAbfB contains a novel CBM that recognizes the arabinose side chains of arabinoxylans.

EXPERIMENTAL PROCEDURES
Data Collection-Expression, purification, crystallization, and x-ray data collection of native and xenon-derivatized crystals were previously reported (18). A platinum derivative was prepared by soaking native crystals in 15 mM K 2 PtCl 4 for 2 days. A mercury derivative was prepared by soaking in 10 mM HgCl 2 for 2 days. The arabinose complex was prepared by soaking native crystals in 50 mM L-arabinose for 90 min. The xylooligosaccharides reagent, which is a mixture of Ͻ5% xylose, 50 -70% xylobiose, 20 -40% xylotriose, 1-20% of xylotetraose and longer xylooligosaccharides, were purchased from Wako Pure Chemical (Osaka, Japan). The data sets were collected at 100K with a CCD camera on the NW12 stations at Photon Factory AR, High Energy Accelerator Research Organization, or with an R-axis IV ϩϩ system on a x-ray generator, UltraX18 (Rigaku Corp.). Diffraction images were indexed, integrated, and scaled with the HKL program suite (19) or CrystalClear 1.3 (Rigaku Corp.).
Phasing and Refinement-Programs SOLVE (20) and RESOLVE (21) were used for phase calculation and density modification, respectively.
Program ARP/wARP (22) was used for automatic model building. Visual inspection of the models was performed using XtalView (23). Several rounds of energy minimization and individual B-factor refinement were carried out using CNS1.1 (24). The arabinose complex structure was solved by starting from the refined native structure. The figures were prepared using SPOCK (25), Raster3D (26), MOLSCRIPT (27), and Xtalview.

RESULTS
Overall Structure-The arabinose-free structure of AkAbfB was determined at 1.75 Å (Table I), and refined to an R-factor of 19.1% and an R free factor of 20.9%. Residues 18 -499 (the first 17 residues having been removed as a signal peptide) were built. There was one monomer of AkAbfB per asymmetric unit. AkAbfB is an almost all-␤ protein and is organized into two domains: an N-terminal catalytic domain (residues 18 -335) and a C-terminal domain (ABD, residues 336 -499) (Fig. 1A). There are three disulfide bonds (Cys 21 -Cys 31 , Cys 81 -Cys 86 , and Cys 176 -Cys 177 ) in the catalytic domain and one (Cys 401 -Cys 439 ) in ABD. All these eight cysteine residues are conserved in GH54 enzymes.
At one of the two potential N-glycosylation sites in the AkAbfB sequence, Asn 202 exhibited extra electron density emanating from its side-chain amide nitrogen atom. The electron density of two N-acetylglucosamine units was clearly visible, but that of the adjacent mannose unit was not observed ( Fig.  2A). N-Linked glycosylation of the recombinant AkAbfB protein expressed in P. pastoris has already been confirmed by SDS-PAGE and periodic acid-Schiff staining (18). The native enzyme produced by A. kawachii also contains a glycoside chain (data not shown). The glycosylated recombinant AkAbfB expressed in P. pastoris shows higher activity (29.3 mol/min/ mg) than the Escherichia coli recombinant protein (18.7 mol/ min/mg) (18), but they show almost the same thermostability (data not shown). However, a glycosylation-free (T204A) mutant expressed in P. pastoris exhibited significantly reduced thermostability and slightly reduced catalytic activity compared with the wild-type enzyme. 2 The glycosylation may contribute to the structural integrity of AkAbfB because the glycosylation site is located at the interface between the two domains, far from the active site.
According to the DALI score (29), the structure of the catalytic domain is closest to that of the non-catalytic lectin-like domain of sialidase from Trypanosoma rangeli (TrSA) (Fig. 1B  and Table II, Catalytic domain). Of the 318 residues of AkAbfB, 192 can be superimposed on the lectin-like domain; the C␣ rmsd is 2.9 Å (DALI Z-score ϭ 13.8). The non-catalytic lectinlike domain of TrSA is attached to the catalytic GH33 ␤-propeller domain (30). Most of the ␤-strands of the lectin-like domain of TrSA and catalytic domain of AkAbfB can be superimposed. However, the corresponding area of the lectin-like domain of TrSA exhibits great deviation from the catalytic pocket of AkAbfB. Possible roles of the non-catalytic domain of TrSA in carbohydrate recognition and cell adhesion mechanisms have been proposed, but no direct evidence has been obtained for these hypotheses (30). DALI server analysis indicated structural homology to another non-catalytic protein, tetanus neurotoxin (Z-score ϭ 12.9, rmsd ϭ 3.1 Å for 188 residues).
The catalytic domain of AkAbfB also shows structural similarity with clan-B GHs (31), e.g. GH16 -carrageenase and GH7 cellobiohydrolase (Table II, Catalytic domain). The DALI Zscore with -carrageenase from Pseudoalteromonas carrageenovora was 10.3 (rmsd ϭ 3.2 Å for 169 residues). -Carrageenase folds into a curved ␤-sandwich, with a tunnel-like active site cleft (31). -Carrageenase hydrolyzes the ␤-1,4-linkage of -carrageenan bound to this tunnel. The DALI Z-score 2 T. Koseki, unpublished data. with cellobiohydrolase I from Hypocrea jecorina (formerly known as Trichoderma reesei) L27 was 7.8 (rmsd ϭ 3.7 Å for 181 residues). Cellobiohydrolase I also folds into a large ␤-sandwich, with a 50-Å long cellulose-binding tunnel (32,33). Cellobiohydrolase hydrolyzes ␤-1,4-linkages from the ends of cellulose chains to liberate ␤-cellobiose. When AkAbfB is superimposed on these enzymes, the substrate binding pocket of AkAbfB lies near these substrate binding tunnels. Both the catalytic residues of AkAbfB (Glu 221 and Asp 297 , described later) are located in the concave pocket of the ␤-jelly roll fold. However, one of the catalytic residues (Asp 297 ) does not superimpose on any of clan-B enzymes, although the other catalytic residue (Glu 221 ) does. Moreover, AkAbfB does not contain the E-X-D-X-X-E motif of the clan-B enzymes. Thus, the catalytic domain of GH54 AkAbfB is fairly distinct from those of clan-B enzymes, but they may have evolutional relationships.
ABD contains three segments of about 50 amino acid residues long (␣ subdomain: 348 -399, ␤ subdomain: 400 -446, ␥ subdomain: 447-499) (Fig. 4, C and D). Each subdomain consists of four-stranded ␤-hairpin turns and one additional ␣-helix (␣-1), which is inserted between the ␤-3 and ␤-4 strands in each subdomain. There is structural similarity between each pair of subdomains (C␣ rmsd, ␣␤:1.6 Å, ␤␥:1.9 Å, and ␥␣:3.4 Å), and they assemble with a pseudo-3-fold axis. The loop between ␤-3 and ␣-1 of the ␥ subdomain is significantly different from those of the other two subdomains. However, amino acid sequence homology between the subdomains is almost undetectable with only three residues (serine in the ␤-1 strand and tyrosine and histidine in the ␤-2 strand) conserved in the three subdomains (Fig. 4D). Three hydrophobic residues present in all subdomains (Leu/Phe in the ␤-1 strand, Ile/Phe in the ␤-2 strand, and Trp/Phe in the ␤-4 strand) gather at the center of the ␤-trefoil fold, forming a hydrophobic core (Fig. 4D). The hydrophobic core of CBM13 proteins is also formed by hydro- phobic residues at similar positions. Therefore, the structural scaffold of ABD is similar to that of CBM13 proteins. However, at least three different characteristics between CBM13 and ABD were found. First, AkAbfB does not contain the G-X-X-X-Q-X-(W/Y) motif of CBM13 (Fig. 4D) (36). It is supposed that the glutamine residue plays an important role in substrate recognition and that the tryptophan residue is involved in formation of the hydrophobic core. Second, hydrophobic residues such as tyrosine are located at the position of the comparatively conserved disulfide bond of CBM13 in each sub-domain. Instead, ABD contains one disulfide bond (Cys 401 -Cys 439 ) on the surface of its ␤ subdomain (Fig. 4A). Third, the sugar binding sites of AkAbfB are completely different from those of CBM13 (described later).
Considering the above points, ABD, the non-catalytic ␤-trefoil domain of GH54, may be classified as a new CBM. In the CAZy data base, GH54 enzymes including AkAbfB have not been mentioned to have any CBMs until now, probably because they do not exhibit any detectable sequence similarity to known CBMs. However, one possible member of GH54, ␣-L-arabino-  furanosidase from H. jecorina PC-3-7 (HjPCAbf), is reported to contain a non-catalytic xylan-binding domain in its C-terminal 18-kDa portion, which can be proteolytically cleaved by pepsin (37). HjPCAbf is probably a member of GH54 because its Nterminal amino acid sequence is almost identical to that of GH54 ␣-L-arabinofuranosidase/␤-xylosidase from H. jecorina RutC-30 (HjRuAbf) (38). Therefore, the xylan-binding domain of HjPCAbf seems to correspond to the ABD of AkAbfB.
Complex Structure with L-Arabinose-An AkAbfB crystal soaked in 50 mM L-arabinose for 90 min diffracted up to 2.07 Å ( Table I). The structure was refined to an R-factor of 18.8% and an R free factor of 22.1%. The complex structure closely resembled the native structure (C␣ rmsd ϭ 0.13 Å). In the electron density map, one arabinofuranose molecule was found in the catalytic domain and two in ABD (Fig. 2, B-D).
AkAbfB crystals soaked in 125 mM D-xylose or a 5% solution of the mixture of xylooligosaccharides (see "Experimental Procedures") for at least 3 h were prepared to determine whether they form a complex, but no significant electron density corresponding to a sugar moiety was observed.
One Arabinose Molecule in the Catalytic Domain-One arabinofuranose molecule was found in the negatively charged pocket of the catalytic domain. Because AkAbfB cleaves ␣-1,2and ␣-1,3-arabinofuranosidic bonds, this site is assigned as subsite Ϫ1. The arabinofuranose molecule is recognized be-cause of many hydrogen bonds and hydrophobic interactions (Fig. 5). The electron density of the O-1 atom was disordered (Fig. 2B), probably because the hydroxyl group was reduced or it took both ␣ and ␤ configurations. Three other hydroxyl groups, e.g. O-2, O-3, and O-5, form one or two possible hydrogen bonds with AkAbfB. The location of the ϩ1 subsite could not be confirmed in this study. The site appears to be located outside of the catalytic pocket if one exists. The interaction at subsite ϩ1 would be loose because xylose or xylooligosaccharide complexes could not be obtained. A loose interaction at subsite ϩ1 may enable the hydrolysis of both ␣-1,2and ␣-1,3-arabinofuranosyl linkages, which connect the arabinose side-chains of arabinoxylans.
The important residues involved in catalysis and substrate binding interactions are as follows: Cys 176 , Cys 177 , Met 195 , Trp 206 , Asp 219 , Glu 221 , Asn 222 , Leu 224 , and Asp 297 . All of these residues are conserved in GH54 enzymes. The carboxyl groups of Glu 221 and Asp 297 are located on either side of the anomeric C-1 carbon of arabinofuranose, indicating that they are catalytic residues (Fig. 5A). As expected for a retaining enzyme (9), these two residues are separated by 5.6 Å. Glu 221 lies closer (3.2 Å) to the C-1 atom of arabinofuranose than Asp 297 (3.4 Å). The E221A mutant showed no detectable activity (less than 10 Ϫ6fold the activity of the wild-type enzyme), but the D297A mutant exhibited 10 Ϫ3 -fold the activity. Based on the above re- sults and structural comparison with the active site of GH51 GsAbfA (Fig. 5B) (15), we conclude that Glu 221 is the nucleophile and Asp 297 is the acid/base residue. However, the addition of a nucleophile reagent, sodium azide, did not chemically rescue the catalytic activity of the D297A mutant (data not shown). Because the active site of AkAbfB is located inside the pocket, the azide ion probably could not penetrate into the site. When the active sites of GH54 AkAbfB and GH51 GsAbfA were superimposed, the relative positions of arabinofuranose and the two catalytic residues were the same, but the mode of recognition of arabinofuranose was considerably different (Fig.  5B). The O-2 atom of arabinofuranose forms hydrogen bonds with the side-chain carbonyl groups of Glu 221 and the main chain nitrogen atom of Asp 297 . Asp 219 forms hydrogen bonds with the O-3 and O-5 atoms. Therefore, it is considered that Asp 219 participates in substrate binding. In addition to these acidic residues, there are three more acidic residues in the vicinity: Asp 179 , Glu 184 , and Asp 189 . The O-5 atom also forms a hydrogen bond with the main chain nitrogen atoms of Asn 222 . The C-4 atom forms hydrophobic interactions with the sulfur atoms of Cys 176 and Cys 177 . The C-5 atom forms hydrophobic interactions with the sulfur atoms of Cys 177 and Met 195 and the carbon atoms of Trp 206 and Leu 224 .
It is interesting that a disulfide bond between adjacent cysteine residues (Cys 176 and Cys 177 ) is formed with an uncommon non-proline cis-peptide bond between them (Fig. 2E). The distances between these sulfur atoms and the C-4/C-5 atoms of arabinofuranose are about 4 Å, and this disulfide bond seems to recognize these carbon atoms through hydrophobic interaction. In GH51 GsAbfA, a tryptophan side chain replaces this disulfide bond and similarly recognizes the carbon atoms of arabinofuranose through hydrophobic interaction (15). A disulfide bond between adjacent cysteines joined by a cis-peptide bond has been rarely reported, i.e. only for methanol dehydrogenase (39) and human ribonuclease inhibitor (40). In methanol dehydrogenase from Methylobacterium extorquens, the disulfide bond formed by Cys 103 and Cys 104 lies immediately above the pyrroloquinoline quinone. Incubation of this methanol dehydrogenase with 5 mM dithiothreitol leads to reversible inactivation (41), and each mutant with the cysteines replaced by serine is completely inactive (42), indicating that the disulfide bond is required for the activity. As to AkAbfB, even after incubation of 125 mM dithiothreitol for 80 min, the activity was not decreased at all. The disulfide bond may not have been reduced because it is located deep inside the pocket. The double mutant (C176A/C177A) exhibited significantly increased K m (Ͼ10 mM) and decreased k cat (3.9 s Ϫ1 ) for p-nitrophenyl ␣-Larabinofuranoside compared with those of the wild-type enzyme (K m ϭ 0.76 mM and k cat ϭ 26.8 s Ϫ1 ). A comprehensive data base search of about 23,000 Protein Data Bank entries released before 2003 revealed only two similar examples, 3 which exhibit hydrophobic interactions between a sugar moiety and a disulfide bond formed by two adjacent cysteines. One example is the interaction between a disulfide bond formed by Cys 241 -Cys 242 of GH43 arabinanase A from Cellvibrio japonicus (formerly Pseudomonas cellulosa) and the arabinose moiety of the bound arabinan (1GYE) (43). The C-1 atom of AHR804 is located 3.6 and 4.0 Å from the S␥ atoms of Cys 241 and Cys 242 , respectively. The other example is an interaction between a disulfide bond formed by Cys 82 -Cys 83 of agglutinin from Anguilla anguilla and bound fucose (1K12) (44). The distances between the C-1 atom of fucose and the S␥ atom of Cys82 and between the C-2 atom of fucose and the S␥ atom of Cys83 are both 3.9 Å. However, the two adjacent cysteines in both examples are connected by normal trans-peptide bonds.
In the complex structure, the conformations of the arabinofuranose rings of the three molecules appear to be in the 4 T 0 conformation (an asymmetrical twist with C-4 above and the endocyclic O-4 beneath the plane). If so, the conformation is the same as that of the Michaelis complex with arabinofuranose-␣ (1, 3)-xylopyranose reported for GH51 GsAbfA (Fig. 5B) (15). However, the limited crystallographic resolution does not allow detailed discussion of the sugar conformation of AkAbfB.
Two Arabinose Molecules in ABD-In ABD, two arabinofuranose molecules were found in pockets in the ␤ and ␥ subdomains but not in the ␣ subdomain. The O-1 atoms of both these arabinofuranose molecules were clearly seen in the electron density map and were exposed to the solvent. Therefore, AkAbfB seems to be able to recognize arabinofuranose residues linked to the xylan backbones of arabinoxylans.
The ␤ and ␥ subdomain pockets bound arabinofuranose in the same manner (Fig. 2, C and D). Arabinofuranose in the ␤ subdomain pocket is stacked between Tyr 417 in the ␤ subdomain and Tyr 456 in the next ␥ subdomain. Similarly, arabinofuranose in the ␥ subdomain pocket is stacked between Tyr 464 in the ␥ subdomain and Tyr 359 in the next ␣ subdomain. An aspartate residue in each subdomain, Asp 435 (␤ subdomain) and Asp 488 (␥ subdomain), forms hydrogen bonds with the O-2 and O-3 atoms of arabinofuranose. Moreover, each histidine residue, His 416 (␤ subdomain) and His 463 (␥ subdomain), forms a hydrogen bond with the O-5 atom. Mutation of the aspartate residue in each subdomain (Asp 435 in the ␤ subdomain and Asp 488 in the ␥ subdomain) did not significantly reduce the p-nitrophenyl ␣-L-arabinofuranoside-hydrolyzing activity (D435N 93% and D488N 82% of the wild-type enzyme level), indicating that these sites themselves are not related to the catalytic activity.
When the ␤ and ␥ subdomains were superimposed, the arabinofuranose molecules and these arabinofuranose binding residues (histidine, aspartate, and two tyrosine residues) almost completely overlapped (Fig. 4C). In the ␣ subdomain pocket, the residues corresponding to the aspartate and two tyrosines are replaced by one glutamate and two threonines, respectively (Fig.  4D). This replacement seems to prevent an arabinofuranose molecule from binding to the ␣ subdomain pocket.
A number of complex structures of ␤-trefoil lectins and CBM13 have been reported. The lactose-complex structure of the ricin toxin B-chain (RTB) from Ricinus communis has been reported (45), showing that RTB has three galactose-binding sites. The complex structures of CBM13 derived from Streptomyces olivaceoviridis E86 xylanase XynC and from S. lividans xylanase 10A with various mono-or oligosaccharides have been reported (46 -48). RTB and CBM13 bind xylose or xylooligosacchrides at the same sugar binding sites formed by two ␤-strands (␤-2 and ␤-3) and a loop between ␤-3 and ␤-4. However, the locations of the arabinose-binding sites of ABD of AkAbfB are clearly different and are shifted about 8 Å in the direction of the next subdomain, and the structures around them are very different (Fig. 4, A and B). An additional helix (␣-1) is involved in the sugar recognition by ABD of AkAbfB. DISCUSSION According to the classification of CBM defined by Boraston et al. (49), ABD can be grouped into the fold family 2 (␤-trefoil) along with CBM13. Although the details of the substrate recognition type of ABD is still unknown, it appears to be grouped into the type C small sugar binding CBMs because at least three hydrogen bonds (two by asparagine and one by histidine) are formed in the small arabinose binding pocket of each subdomain.
CBMs that bind crystalline cellulose possess a flat hydrophobic surface that interacts with adjacent chains on the surface of the crystal lattice (50, 51), whereas CBMs that bind amorphous cellulose, mannan, and xylan possess clefts that interact with a single chain of their poly-/oligosaccharide ligands (52). Polysaccharide backbone-degrading enzymes such as xylanases frequently contain CBMs. Several crystal structures of CBMs in complexes with their target saccharide ligands have been reported. For example, P. cellulosa xylanase Xyn10C contains CBM15, and the crystal structure in a complex with xylopentaose has been reported (53). The central three sugars of xylopentaose exist in the main cavity. Interestingly, the 2-OH and 3-OH groups of the bound ligands are solvent exposed in four of the five observed binding subsites. This provides an explanation for the ability of CBMs to bind to highly decorated xylans. Recently, the crystal structures of CBMs complexed with oligosaccharides decorated with side-chain sugars have been reported. The crystal structures of S. olivaceoviridis CBM13 in complexes with ␣-L-arabinofuranosyland 4-O-methyl-␣-D-glucuronosyl-xylooligosaccharides have been determined (47). The sugar-binding site in CBM13 interacts mainly with a single xylose moiety of the xylooligosaccharides. In the complex structures with ␣-L-arabinofuranosyl-xylooligosaccharides, the arabinofuranose residue is positioned on the opposite side of the binding site and held through only weak interactions with the protein. In the complex structures with 4-O-methyl-␣-D-glucuronosyl-xylooligosaccharides, the 4-O-methyl-␣-D-glucuronosyl residue is solvent exposed and does not interact with the protein. Thus, these CBMs can accommodate even highly heterogeneous saccharides, which contain numerous side-chain sugars, through interaction with only backbone sugar groups.
The residues involved in arabinose-binding of ABD are relatively conserved in the putative C-terminal xylan-binding region of HjRuAbf (Fig. 4D), as well as in all other GH54 members. Therefore, it seems that the xylan-binding domain of HjPCAbf also confers binding ability to arabinofuranose side chain. The ␤ subdomain pocket in ABD lies on the same face as the active site pocket (Fig. 3), and the distance between them is about 21 Å. Considering the length of one unit of ␤-1,4-xylobiose (10 Å), the ␤ subdomain pocket could accommodate an arabinofuranose side chain separated by four xylose units from the cleavage site. However, more detailed information for the binding specificity and affinity of the new CBM of GH54 is still needed to define its function and biological role. On the other hand, GH51 GsAbfA contains a non-catalytic ␤-sandwich domain, which exhibits structural similarity with non-catalytic domain C of ␣-amylases and also with cellulose-binding domains (15). However, no sugar-binding ability of the ␤-sandwich domain has been mentioned. This may explain why GH54 enzymes exhibit higher hydrolysis activity toward insoluble arabinoxylans than GH51 enzymes (8,14).