Structural and Functional Characterization of the R-modules in Alginate C-5 Epimerases AlgE4 and AlgE6 from Azotobacter vinelandii

Background: Alginate epimerases consist of catalytic and noncatalytic domains of yet unknown function. Results: The noncatalytic domains of AlgE4 and AlgE6 possess different alginate binding behavior despite highly similar structures. Conclusion: Noncatalytic subunits of AlgE6 and AlgE4 influence the product specificity of the catalytic domain. Significance: This work opens a new route to designing alginate epimerases producing tailored alginates. The bacterium Azotobacter vinelandii produces a family of seven secreted and calcium-dependent mannuronan C-5 epimerases (AlgE1–7). These epimerases are responsible for the epimerization of β-d-mannuronic acid (M) to α-l-guluronic acid (G) in alginate polymers. The epimerases display a modular structure composed of one or two catalytic A-modules and from one to seven R-modules having an activating effect on the A-module. In this study, we have determined the NMR structure of the three individual R-modules from AlgE6 (AR1R2R3) and the overall structure of both AlgE4 (AR) and AlgE6 using small angle x-ray scattering. Furthermore, the alginate binding ability of the R-modules of AlgE4 and AlgE6 has been studied with NMR and isothermal titration calorimetry. The AlgE6 R-modules fold into an elongated parallel β-roll with a shallow, positively charged groove across the module. Small angle x-ray scattering analyses of AlgE4 and AlgE6 show an overall elongated shape with some degree of flexibility between the modules for both enzymes. Titration of the R-modules with defined alginate oligomers shows strong interaction between AlgE4R and both oligo-M and MG, whereas no interaction was detected between these oligomers and the individual R-modules from AlgE6. A combination of all three R-modules from AlgE6 shows weak interaction with long M-oligomers. Exchanging the R-modules between AlgE4 and AlgE6 resulted in a novel epimerase called AlgE64 with increased G-block forming ability compared with AlgE6.

The bacterium Azotobacter vinelandii produces a family of seven secreted and calcium-dependent mannuronan C-5 epimerases (AlgE1-7). These epimerases are responsible for the epimerization of ␤-D-mannuronic acid (M) to ␣-L-guluronic acid (G) in alginate polymers. The epimerases display a modular structure composed of one or two catalytic A-modules and from one to seven R-modules having an activating effect on the A-module. In this study, we have determined the NMR structure of the three individual R-modules from AlgE6 (AR1R2R3) and the overall structure of both AlgE4 (AR) and AlgE6 using small angle x-ray scattering. Furthermore, the alginate binding ability of the R-modules of AlgE4 and AlgE6 has been studied with NMR and isothermal titration calorimetry. The AlgE6 R-modules fold into an elongated parallel ␤-roll with a shallow, positively charged groove across the module. Small angle x-ray scattering analyses of AlgE4 and AlgE6 show an overall elongated shape with some degree of flexibility between the modules for both enzymes. Titration of the R-modules with defined alginate oligomers shows strong interaction between AlgE4R and both oligo-M and MG, whereas no interaction was detected between these oligomers and the individual R-modules from AlgE6. A combination of all three R-modules from AlgE6 shows weak interaction with long M-oligomers. Exchanging the R-modules between AlgE4 and AlgE6 resulted in a novel epimerase called AlgE64 with increased G-block forming ability compared with AlgE6.
Alginates are unbranched biopolymers consisting of ␤-Dmannuronic (M) 4 acid and its C-5 epimer ␣-L-guluronic (G) acid (1,2). The physical and chemical properties of alginate are influenced by the relative amount of M and G as well as the distribution of the two monomers in the polymer chain (3). Along the alginate chain, the G and M residues are arranged in a pattern showing homopolymeric regions of G residues (G-blocks) and homopolymeric regions of M residues (M-blocks) interspersed by regions in which the two groups coexist in a strictly alternating sequence (MG-blocks). The wide variability of the composition and sequential structure of alginate is a key functional attribute of the polysaccharide. Although the intrinsic inflexibility of the alginate chain increases in the order MG Ͻ MM Ͻ GG (4), the selectivity for binding of cations and gel-forming properties is mainly linked to the G-blocks (3,5,6). Recently, it has been shown that G-blocks larger than 100 consecutive residues are essential for forming stable calcium gels (7).
Alginate is produced as poly-M and then certain M residues are converted to G by epimerases acting on the polymer level. The alginate-producing bacterium Azotobacter vinelandii has one periplasmic epimerase, which incorporates single G residues into the alginate during secretion of the polymer (8). In addition, A. vinelandii produces seven extracellular C-5 alginate epimerases called AlgE1-7 (9,10). Each of the epimerases convert mannuronic acid to guluronic acid in different patterns, and additionally, AlgE7 also shows lyase activity (11,12). The seven epimerases consist of two types of structural modules designated A-and R-modules (9,10). The smallest of these epimerases is AlgE4, which consists of one A-and one R-module and incorporates MG-blocks into the alginate sequence.
The atomic resolution structures of the A-module (13) and R-module (14,15) of AlgE4 have been determined. It has also been shown that both modules bind to alginate (15,16). The second smallest alginate epimerase is AlgE6, consisting of one A-module followed by three R-modules (AR1R2R3). The A-modules and R-modules of AlgE4 and AlgE6 share a high sequence identity and similarity (10) but produce alginates with very different M and G distribution and content. While AlgE4 produces MG-blocks into the alginate sequence, AlgE6 produces G-blocks as end products. Only the A-modules are catalytically active (16), whereas the role of the R-module is more enigmatic. R-modules alone show no catalytic activity on alginates; nevertheless, if one R-module is bound after an A-module, the epimerization rate is increased 10-fold (16). Epimerization of alginate by AlgE4 occurs by a processive mode of action meaning that the enzyme is sliding along the alginate polymer and where both the A-module and the R-module bind to the alginate molecule (17)(18)(19). Therefore, it was suggested that the R-module helps to orient the alginate in the active site and to keep AlgE4 tightly bound to the polymer during epimerization. The role of the R-modules in G-block-producing epimerases is not clear; however, recent results obtained with a library of epimerases with mutated A-modules have indicated that R-modules can influence the epimerization pattern (20). These mutant enzymes were constructed with the R-module from AlgE4 to stimulate the activity. Aiming at generating even more potent G-block-forming epimerases, the A-modules of the high G-block-forming epimerase mutants were combined with the three R-modules from AlgE6 (20). Surprisingly, this resulted in mutant epimerases with very similar G-block-forming abilities as AlgE6. Moreover, the mode of action for AlgE6 is not as clear as for the processive AlgE4. To be able to epimerize two or more mannuronic acids in a row, the AlgE6 would have to turn 180°a round the polymer chain between each epimerization step. A processive mode of action would therefore only be possible if one AlgE6 enzyme slides along the alginate polymer incorpo-rating MG-blocks, while another AlgE6 enzyme epimerizes the newly formed MG-block to a G-block.
To understand the role of multiple R-modules in G-blockproducing epimerases and moreover to understand what causes the differences between the R-modules of AlgE4 and AlgE6, we have determined the atomic resolution structures of the three R-modules of AlgE6 by NMR. Additionally, the binding of AlgE4 and AlgE6 R-modules to alginate oligomers with different composition was investigated using NMR spectroscopy and isothermal titration calorimetry (ITC). It is known that AlgE4R binds to alginate, but more detailed binding studies should reveal possible preferences of the R-modules for certain types of alginate. The A-and R-modules have defined elongated structures, but initial tests showed flexibility in the structure between the two modules (21). The overall structures of AlgE4, AlgE6, and the different A-and R-modules were determined by using small angle x-ray scattering (SAXS), also providing information on the orientation of the modules toward each other.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification of the R-modules-Plasmids and bacterial strains used in this study are summarized in Table 1. All plasmids, except pAT116 and pAT117, have a T7/lac promoter and ampicillin resistance gene and were transformed into Escherichia coli strain ER2566 for protein expression.
The composition of expression media was as follows: 1 liter of LB medium contained 10 g of tryptone, 5 g of yeast extract, and 5 g of NaCl. The pH was adjusted to 7.2 with 4 M NaOH, and the medium was sterilized by autoclaving. For 1 liter of M9 medium, 7.2 g of Na 2 HPO 4 ⅐2H 2 O, 3 g of KH 2 PO 4 , 0.5 g of NaCl, and 1g of (NH 4 ) 2 SO 4 were dissolved in 1 liter of H 2 O. The pH was adjusted to 7.4, and the medium was autoclaved. Before the expression, 2 ml of 1 M MgSO 4 , 20 ml of trace metal solution, and 2 g of glucose dissolved in 10 ml of H 2 O were sterilized by filtration and added to the medium. Trace metal solution for M9 medium contained 0.1 g/liter ZnSO 4 , 0.8 g/liter MnSO 4 , 0.5 Independent of which medium was used, all cells were grown at 37°C to A 600 nm ϳ0.8. The cell culture was incubated on ice for 5 min. Expression was induced by addition of isopropyl 1-thio-␤-D-galactopyranoside (final concentration 1 mM), and then the culture was incubated at 15°C for 16 -20 h. pAT116 and pAT117 was transformed to E. coli strains XL10-gold and RV308, respectively, and recombinant protein production was performed as described previously (20,22). The cells were harvested by centrifugation and resuspended in 20 mM HEPES, pH 6.9, 800 mM NaCl, 5 mM CaCl 2 , and 0.1% Triton X-100. The cell pellet was stored at Ϫ20°C if purification was not performed immediately. Expression of AlgE6R1, AlgE6R2, and AlgE6R3 for structure determination was described recently (23)(24)(25). AlgE4 was expressed as uniformly deuterated and 15 N-labeled proteins. AlgE6R1R2R3 and ORF-9 were expressed using the same protocol as for AlgE6R1 (23). For the SAXS measurements, no isotope labeling was required, and protein expression was thus conducted in LB medium. The cells were sonicated, and the crude cell extracts were loaded on a chitin column (NEB6651, New England Biolabs). The column was washed with 20 mM HEPES, pH 6.9, 800 mM NaCl, and 5 mM CaCl 2 using at least 10 column volumes. The proteins were cleaved from the chitin-binding tag by loading washing buffer containing 50 mM DTT on the column and letting it react for 40 h at room temperature. The purified proteins were eluted from the column and dialyzed using 20 mM HEPES, pH 6.9, and 25 mM CaCl 2 . If the proteins were not used immediately, they were freeze-dried and stored at Ϫ20°C. Protein used for SAXS were further purified by size exclusion chromatography to remove any aggregates. Production, purification, and refolding of 2 H, 15 N-AlgE4 and 2 H, 15 N-AlgE4A were performed as described previously (21).
Size Exclusion Chromatography-The samples for SAXS were concentrated to less than 0.5 ml, or proteins that had been freeze-dried were dissolved in less than 0.5 ml of MilliQ water. The protein solutions were loaded on Superdex 200 10/300 GL (AlgE6 and AlgE4) or Superdex 75 HT 10/30 (AlgE6R1, AlgE6R2, AlgE6R3, AlgE4R, and AlgE4A), respectively. The samples were eluted from the column with 5 mM HEPES, pH 6.9, 50 mM NaCl, 10 mM CaCl 2 , and 0.5 M glycine. The monomeric fractions were concentrated to less than 0.5 ml and dialyzed using 20 mM HEPES, pH 6.9, 125 mM NaCl, 25 mM CaCl 2 , and 0.5 M glycine.
NMR Spectra-All NMR spectra were measured on a Bruker Avance 600 spectrometer equipped with a 5-mm Z-gradient TCI (H/C/N) cryogenic probe and on a Bruker DRX 600 spectrometer equipped with a 5-mm xyz gradient TXI (H/C/N) probe. All measurements were performed at 25°C.
For the NMR structure, calculations of uniformly 15 N-and 13 C-labeled proteins were produced. Cells containing the plasmid pFA8, pFA12, or pFA13 were grown in M9 medium containing 15 N-and 13 C-labeled nitrogen and carbon sources as described recently (23)(24)(25). For the structural calculations, three different types of NOESY spectra were used as follows: a three-dimensional 15 N-edited NOESY (mixing time of 80 ms) recorded on a sample dissolved in 95% H 2 O, 5% D 2 O; a threedimensional 13 C-edited NOESY (mixing time of 80 ms) on a sample dissolved in 99% D 2 O; and a two-dimensional NOESY (mixing time of 70 ms) spectrum recorded on an unlabeled sample dissolved in 99% D 2 O.
Structural Calculations-NOE cross-peaks were assigned and integrated manually using the program NEASY (26). The structural calculations were performed by the program CYANA (27). For the first structural calculations, torsion angle constraints obtained from the program TALOS (28) were used. After obtaining more refined structures of AlgE6R1, AlgE6R2, and AlgE6R3, the TALOS constraints were excluded from the calculations. Structural calculations started from 100 random conformations, and the 20 final conformations with the lowest target function were selected. Ca 2ϩ ions were incorporated in the structural calculation as described previously (15). Refinement of structures was performed by the program YASARA (29) in two steps as follows: first, in vacuo using the NOVA force field, and second, in explicit water using the YASARA force field.
Protein Alignment and Surface Calculation-The amino acid sequences of AlgE4R, AlgE6R1, AlgE6R2, and AlgE6R3 were aligned by ClustalW2 (30) at the European Bioinformatics Institute. Pairwise structural alignment of the ␤-rolls or core proteins was performed in PyMOL (31). For simplicity reasons, numbering in the alignment of the ␤-roll ( Fig. 1) is defined from the first identical amino acid (Gly-1 in AlgE4R(Gly-387); Gly-2 in AlgE6R1(Gly-383); and Gly-11 in both AlgE6R2(Gly-544) and AlgE6R3(Gly-705)). The core-R-modules are defined from the first identical amino acid until the C-terminal end of AlgE6R1 and AlgE6R2 or the amino acid that is in the same position when the four structures are aligned (Thr-150 in AlgE4R and Ala-161 in AlgE6R3). The structures were aligned pairwise, and the root mean square deviation of the backbone atoms (N, C ␣ , and CЈ) was calculated.
The electrostatic surface potential on the surface of the different R-modules was visualized by the APBS Plugin written by M. Lerner (32). Ca 2ϩ ions could not be incorporated, and therefore aspartic acid and glutamic acid residues pointing into the ␤-roll were artificially protonated (Fig. 1). PDB files were transformed to pqr files by pdb2pqr (33), and these pqr files were used as input files for the APBS calculation.
Line Width-A Lorentzian line shape function was fitted to traces taken through 10 different peaks of each module of AlgE4 with the line width as variable parameter. Peaks used for the R-module were assigned as N/H N Asp-14, Leu-16, Gly-21, Gly-46, Thr-51, Phe-52, Glu-76, Asp-83, Ala-118 and Ala-131. For the A-module, there was not any assignment available. Therefore, the 15 N-HSQC spectrum of AlgE4 was overlaid with a spectrum of the A-module. 10 peaks definitively belonging to the A-module were picked from both spectra, and the line width was determined using Equation 1, where w is the line width at half height in Hz; x is point in Hz, and t is center of the Lorentzian curve in Hz.
Alginate Binding by NMR-Binding of alginate oligomers to AlgE4R, AlgE6R1, AlgE6R2, and AlgE6R3 was investigated by NMR at 25°C. The titration of alginate to the protein was done as described previously (15). For NMR measurements, 15 N-labeled material was produced as described above. The protein concentrations used were between 0.2 and 0.5 mM. Protein concentration was determined by measuring A 280 nm using a Nanodrop ND-1000 spectrophotometer and calculated by using the theoretical extinction coefficient obtained from the ProtParam software (37). R-module titrations were achieved by adding small aliquots of alginate oligomers in high concentrations (highest stock concentration 135.6 mM for M3) until no changes in the chemical shift were observed in the 15 N-HSQC. By using concentrated stock solutions, the volume increase in the sample was kept at a minimum.
The chemical shift changes of N and H N atoms from the backbone of the R-module upon titration are given as an absolute change in chemical shift by Equation 2, where ⌬␦ abs is absolute change in chemical shift (Hz); ⌬␦ H is chemical shift change of the amide proton (Hz); ⌬␦ N is change in chemical shift of the amide nitrogen atom (Hz), and x is the constant to achieve equal contribution from changes in N and H N shifts. The constant was set to 5.
The absolute change in chemical shift was plotted versus residue. Residues that experience a ⌬␦ abs Ͼ100 Hz were considered to be affected by binding. Approximately seven of the strongest shifting peaks were used to calculate dissociation constants by fitting the experimental data to Equation 3, where ⌬␦ obs is the observed chemical shift changes at a certain point in titration (Hz); ⌬␦ bound is chemical shift changes of the atom in the protein at full ligand binding (Hz); [PL] is the protein-ligand complex concentration (mol/liter); [P] 0 is the total protein concentration (free and bound) (mol/liter); [L] 0 is the total ligand concentration (free and bound) (mol/liter), and K d is the dissociation constant (mol/liter). Dissociation Constant Determination by ITC-Calorimetric measurements were carried out with a MicroCal VP-ITC microcalorimeter at 25°C, and the calorimeter was controlled by VPViewer 2000 Version 1.4.8 (MicroCal). For the R-module from AlgE4, the alginate oligomers were dissolved in 20 mM HEPES, pH 6.9, and 50 mM CaCl 2 was added to a 0.04 mM R-module solution in 20 mM HEPES, pH 6.9, with 50 mM CaCl 2 , and the energy release/consumption was measured. For the R-modules from AlgE6, the alginate oligomers were dissolved in 20 mM HEPES, pH 6.9, 25 mM CaCl 2 , and, 40 mM NaCl were added to a 0.04 mM R-module solution in 20 mM HEPES, pH 6.9, with 25 mM CaCl 2 , and 40 mM NaCl. For the titration of AlgE6R1R2R3 and ORF-9, alginate oligomers dissolved in 20 mM HEPES, pH 6.9, 20 mM CaCl 2 , and 40 mM NaCl were added to a 0.04 mM protein solution in 20 mM HEPES, pH 6.9, with 20 mM CaCl 2 , and 40 mM NaCl. The heat of dilution for the alginate oligomers in the buffer was recorded separately and subtracted from the titration data prior to analysis. The data were fitted using a nonlinear least squares algorithm using a one-or two-site binding model by the Origin 7.0 software. This yields the stoichiometry (N), the equilibrium association constant (K a ; liter/mol) and the enthalpy change (⌬H r , J/mol) of the reaction. The changes in reaction free energy (⌬G r , J/mol), and FIGURE 1. Alignment of the R-modules of AlgE6 and AlgE4. Positively (Lys and Arg) and negatively (Glu and Asp) charged residues are colored in blue and red, respectively. His residues are labeled in cyan. Asp and Glu residues that point into the ␤-roll for calcium coordination are marked in green. For the electrostatic surface potential calculation these amino acids were artificially protonated. To ease the comparison of the R-modules the alignment of the ␤-roll is defined from the first identical amino acid (Gly-1 in AlgE4R(Gly-387), Gly-2 in AlgE6R1(Gly-383), and Gly-11 in both AlgE6R2(Gly-544) and AlgE6R3(Gly-705)) (indicated by a red arrow). * indicates residues that are identical in all four R-modules; : denotes conserved amino acids, and ⅐ shows semi-conserved amino acids (explanation for the symbols can be found at ClustalW2) (30). The white arrows indicate ␤-strands in the R-modules, and gray indicates that ␤-strand only occurs in AlgE6R2 and AlgE6R3. entropy (⌬S r , J/mol⅐K), as well as the K d were calculated using Equation 4, SAXS Measurement-The small angle x-ray scattering measurements were performed on the laboratory-based instrument at Aarhus University, Denmark (38). It uses a rotating anode as source, multilayer mirrors, and is a two-pinhole setup where the second pinhole, situated immediately before the sample, is a home-built scatterless pinhole. The instrument has a flux of 3 ϫ 10 8 counts s Ϫ1 . The sample to detector distance was set to 640 cm, giving a q range of 0.01-0.345 Å Ϫ1 , where q is the length of the scattering vector defined as q ϭ 4sin()/, where is the x-ray wavelength at 1.54 Å, and 2 is the scattering angle between the incident and scattered beam. The instrument uses a two-dimensional position sensitive gas detector (HiSTAR), and the recorded data are corrected for variation in detector sensitivity and distortion according to standard procedures (38). The data were collected with the sample in a reusable quartz capillary, and all measurements were carried out at 4°C. The scattering patterns were collected for between 3 and 4 h, depending on protein concentration. The protein concentrations were between 1 and 2 mg/ml. Additionally, scattering patterns of the buffers were collected for all samples and used for background subtraction. Conversion of the data to absolute scale by use of water as a primary standard was performed using the SUPERSAXS program package. The final intensity is displayed as a function of the modulus of the scattering vector q.
SAXS Analysis and Modeling-An indirect Fourier transformation (IFT) was used to obtain the pair distance distribution function, p(r) function. This was done using the program WIFT (39). From the IFT, several parameters were obtained as follows: the maximum distance within the particle, D max ; the radius of gyration, R g ; and the forward scattering I(q ϭ 0). Using the forward scattering, the molecular mass was determined for the molecule in solution as shown in Equation 5, where c is the protein concentration, and ⌬ m is the scattering length density difference per unit mass for which a standard value of 2.0 ϫ 10 10 cm/g was used.
The scattering data of the single modules were compared with atomic resolution structures either by use of the known PDB files (AlgE4A, AlgE4R, AlgE6R1, AlgE6R2, and AlgE6R3) or as no PDB file of AlgE6A was available, a homology model was generated by SwissModel (40). The theoretical scattering for these structures in solution is computed using spherical harmonics, and a hydration layer on the molecule is taken into account using the program CRYSOL (41). The discrepancy between the experimental scattering data and the computed scattering pattern was optimized, and the values are reported.
Flexible structures are analyzed using the Ensemble Optimization Method (42), where the flexibility is modeled by representing the scattering data as an ensemble of protein conformations. The structures constituting the ensemble are selected by a genetic algorithm from a large pool (10,000 conformations) of randomly generated structures. The structures are selected to minimize the discrepancy between the average scattering profile of the ensemble and the experimental scattering data.
Scattering from the full-length proteins were modeled using rigid-body optimization of models for the modules. The relative position of the individual modules was optimized to ensure the best agreement with the experimental scattering data. The optimization was performed using a simulated annealing protocol, where interconnection between the modules was imposed and steric clashes were avoided. The interconnection, i.e. the linker region, was determined from the amino acid sequence. For AlgE4, the linker sequence (GATPQQPST) was inserted between Pro-377 and Thr-386. For AlgE6, the linker sequences were defined as follows: 375 GTVSAPPQ 382 , 530 ATPGD 534 , and 683 ADNILFATPVPVD 695 . Between the R-modules of AlgE6, the aforementioned linker residues were deleted from the atomic resolution structures and re-inserted as amino acids with physical size of 3.5 Å each. This procedure was implemented in the program BUNCH (43). The models obtained were not unique, due to the randomness involved in the search. Therefore, a minimum of 10 runs were performed, and the individual models were aligned, compared, and filtered using the programs SUPCOMP and DAMAVER (44). In comparing the models, a similar measure of the models was obtained, called normalized spatial discrepancy (NSD), and the most representative model from the set of the models was provided.

Sequence Alignment of the R-modules of AlgE4 and AlgE6 Reveal Properties Important for Enzyme-Substrate Interactions-
The sequences of the three R-modules from AlgE6 were aligned with the R-module of AlgE4 for determining differences in the primary structure. The last R-module of each extracellular alginate epimerase from A. vinelandii has an unstructured peptide of ϳ20 amino acids that might be essential for secretion via an ATP-binding cassette transporter to the extracellular environment (45). But for the following comparison, this tail was not included. The amino acid identity between the four R-modules is relatively high i.e. 75 out of 150 amino acids or 50% identity (Fig. 1). In addition, many of the amino acid differences can be classified as conserved or semi-conserved, and the differences were not equally distributed throughout the sequences. Of the first 18 amino acids (where all four sequences are aligned), only four were identical. Identity increases to 10 for the next 18 amino acids. Nearly all the amino acids were conserved, except for two regions of five amino acids for the remaining alignment. As the epimerase substrate alginate is a polyanionic polymer, positive amino acids can be anticipated to be important for the substrate interaction. Overall, AlgE4R has 11 positive amino acids in total, whereas AlgE6R1 has 10, and both AlgE6R2 and R3 have nine positive amino acids. The alignment shows that AlgE4R has two additional positively charged amino acids, Lys-103 and Arg-124 (numbering based on AlgE4R), which both are serines in the R-modules from AlgE6. Furthermore, AlgE4R also has Arg-24, which is only found in R1 of AlgE6. AlgE6R3 has an additional arginine at amino acid position 71 compared with the other R-modules. Histidine 133 is conserved throughout the R-modules, and additional histidines are found at posi-tion 62 in AlgE6R1, eight in AlgE6R2, and 8 and 62 in AlgE6R3. These can potentially also interact with alginate.
If only the three R-modules of AlgE6 are compared, 67% of all amino acids were identical, and only on four positions all three of the R-modules have different amino acids (Fig. 1). After the first 36 amino acids (in AlgE6R1) or 45 amino acids (in AlgE6R2 and AlgE6R3), respectively, nearly every amino acid was identical. Moreover, AlgE6R2 and AlgE6R3 have an N-terminal nonapeptide constituting a proline-rich linker between the R-modules of AlgE6.
Overall Structure of Individual R-modules from AlgE4 and AlgE6 Are Very Similar-The structures of the three R-modules of AlgE6 were determined by NMR on the basis of NOE upper distance limits and torsion angle constraints obtained by the TALOSϩ software (46). The experimental data and structural statistics for the three structures are summarized in Table 2. The overall three-dimensional structural fold of the three R-modules of AlgE6 are very similar to each other and to AlgE4R (Fig. 2). The geometrical constraints and coordinate files were deposited in the Protein Data bank under the accession code 2ML1 (AlgE6R1), 2ML2 (AlgE6R2), and 2ML3 (AlgE6R3).
The first 54 amino acid residues of AlgE6R1 or 63 AlgE6R2 and AlgE6R3, respectively, form a right-handed parallel ␤-roll where each complete turn consists of 18 amino acids or two RTX motifs (Repeat in ToXin) (47). The sequence of the RTX motif consists of the nonapeptide GGXGXDZUZ, where the glycine and aspartic acid residues are highly conserved. U is always a large hydrophobic amino acid, mainly leucine and sometimes replaced by isoleucine, valine, or phenylalanine. X can be any amino acid but mostly one with a short side chain, and Z is an amino acid with a long side chain. The first six amino acids form a tight loop that also binds Ca 2ϩ , whereas the last three amino acids form a short ␤-strand. The compact structure of the ␤-roll is opened by the sequence FRF (53-55 in AlgE6R1, 63-65 in AlgE6R2, and AlgE6R3) followed by an antiparallel hairpin structure. The chain folds back and makes a new but less defined turn extending the ␤-roll. The next 10 amino acids are less well defined followed by three long antiparallel ␤-strands (Fig. 2). The last ␤-strand is also parallel to the penultimate ␤-strand of the ␤-roll. There is an additional long loop, and the last ␤-strand completes the ␤-roll. The last 21 amino acids of AlgE6R3 are unstructured and, as commented earlier, can be essential sequence for secretion (45).
Overall, the R-modules from AlgE6 consist of the same secondary structural elements and overall architecture also exhibited by the structure of AlgE4 R-module. The major differences a Core R-modules are from the first identical amino acid (Gly-1 in AlgE4R, Gly-3 in AlgE6R1, Gly-11, in AlgE6R2, and Gly-11 in AlgE6R3 see also supplemental Fig. S1) until the C-terminal end of AlgE6R1 and AlgE6R2 or the amino acid that is in the same position when the four structures are aligned (Thr-150 in AlgE4R and Ala-162 in AlgE6R3). b The secondary structure elements are only the ␤-strands (Figs. 1 and 2). Overall, the structures of the R-modules are very similar to each other, and they have the general structure as described in the following. The structured N-terminal end of the R-modules are formed by a ␤-roll defined by three amino acids forming a short ␤-strand and six amino acids forming a loop to the next short ␤-strand followed by six amino acids forming the next loop, resulting in an 18-amino acid ␤-roll unit. This structure is repeated three times making up three turns of the ␤-roll and then a loop forms an anti-parallel␤-hairpin. The polypeptide chain folds back and makes a new turn elongating the ␤-roll. Hereafter, a longer loop is bulging out, followed by a less well defined region. The structure ends with two antiparallel ␤-strands. The last ␤-strand links them to one of the parallel ␤-sheets making up the ␤-roll. The ␤-strand is situated in between the roll and the anti-parallel ␤-sheet, being parallel to the first and antiparallel to the latter. between all the R-modules are in the orientation of the hairpin structure and the orientation of the antiparallel ␤-strands relative to the ␤-roll (Fig. 2).
Charge Distribution on the Surface of the R-modules-As pointed out above, it is likely that the positively charged amino acids in the R-modules are important for interaction with the polyanionic alginate polymer. The electrostatic surfaces of the R-modules have been calculated and are visualized in Fig. 3. In general, the front side of the R-modules all display positively charged regions along a shallow groove perpendicular to the ␤-strands and as shown previously for AlgE4R (see below and Ref. 15), this surface is able to bind pentameric mannuronate. The electrostatic surface of AlgE6R1 is similar to the surface of AlgE4R, but with a less pronounced positive potential. AlgE6R2 and AlgE6R3 do not show as an extended and clear charge separation as AlgE6R1 and AlgE4R. The charges on the surface of AlgE6R2 are more spread and less concentrated than in AlgE4R. AlgE6R3 has a groove on the front side that shows a strong positive potential. This groove is made up by Arg-40, Lys-42, Arg-71, and Arg-110, where all except Arg-71 are conserved throughout the R-modules. AlgE6R1 and AlgE6R2 have a leucine in this position. Except for this dense and positively charged spot on AlgE6R3, the rest of the electrostatic surface has far less patches with positive potential than the other R-modules.
Module Orientation and Overall Shape of AlgE4 and AlgE6 -Small-angle x-ray scattering data (Fig. 4) were collected for the R-and A-modules separately and for full-length AlgE4 and AlgE6. These data were subsequently analyzed using the model independent IFT method (48), which can provide information on the overall size and shape of alginate epimerases. This method results in paired distance distribution functions, p(r) function and the maximum distance of the molecule, D max . Furthermore, data on the radius of gyration, R g , and forward scattering, I(q ϭ 0), were obtained, facilitating determination of the molecular weight. All four R-modules are spherical in shape, as seen from the bell-shaped p(r) function and have a maximum distance between 45 and 50 Å (Table 3 and Fig. 4, A  and B). Several of the p(r) functions obtained for the R-modules had the D max /2 shifted to the left suggesting that the shape could be described as being ellipsoid rather than fully spherical. The molecular weight determined using the forward scattering (Table 3) corresponds to monomeric molecules for the R-domains, and this is also similar for AlgE4A, AlgE4, and AlgE6. The AlgE4A domain has a slightly elongated shape with a maximum distance of 65 Å (Fig. 4B). The slight elongation was seen from the position of the maximum of the curve that shifted slightly to the left of D max /2. The whole AlgE4 has a maximum distance of 100 Å, and the shape of the p(r) function corresponds to that of an elongated particle, seen from the significant shift of the maximum to the left (Fig. 4B). For AlgE6, the scattering indicates an elongated shape with a maximum distance of 180 Å (Table 3 and Fig. 4B). The shape of the p(r) function for AlgE6 also corresponds to an elongated shape based on the shift of the maximum to the left (Table 3 and Fig. 4D). The R g values obtained from the IFT analysis agree with those obtained using the Guinier approximation (Table 3 and supplemental Fig. S1).
To further investigate the solution structure of the A-and R-modules of AlgE4, a priori information was utilized. Atomic resolution structures are known for the two modules, and these were used to calculate the theoretical scattering pattern. We compared these to the experimentally obtained scattering data using the program CRYSOL (Fig. 4C) (41). The theoretical pattern agrees nicely with the scattering data, with reduced 2 values of 1.91 for the R-module and 1.94 for the A-module.
The solution structure of full-length AlgE4 was determined from the SAXS data by rigid body modeling using the module structures with atomic resolution. These were connected by dummy residues to mimic the linkers between the different modules. The optimization of the modules relative to each other was performed using the program BUNCH (43), which uses a simulated annealing procedure where steric clashes are avoided and the modules are interconnected. Multiple BUNCH runs were performed, and only one population of structures was found with an NSD of 1.11. The models were compared and averaged using the program package DAMAVER (44). Here, the most representative model was also determined and defined as the model having the highest degree of similarities to all the other models. This model yields a good fit with a reduced 2 of 1.23 (Fig. 4C). Modeling of the entire AlgE4 (Fig. 4D) shows an elongated overall shape that is also in good agreement with the p(r) function.
SAXS data collected for full-length AlgE6 and its individual R-modules were used for modeling the whole AlgE6. From the initial data analysis, the molecular weights and the D max values suggest the presence of monomeric proteins in all solutions (Table 3). However, for the R modules, data below q ϭ 0.02 Å Ϫ1 was discarded due to the upturn of the data by the presence of large aggregates in the solution, which was not possible to remove in the purification process. However, this does not sig- nificantly affect the scattering from the single molecules because the large structures are much larger than the single molecules and one also still observes a clear Guinier region for the single molecules ( Fig. 4A and supplemental Fig. S1). To further investigate the solution structures of the individual modules, the theoretical scattering patterns from the available models with atomic resolution, for the A-and R-modules, is calculated using the program CRYSOL (Fig. 4C) (41). From the computation (Fig. 4C), it is evident that the AlgE6R1 and AlgE6R2 modules have a shape in solution similar to that of their models with atomic resolution. The corresponding reduced 2 values obtained were 1.73 and 4.60. However, for the AlgE6R3 module a clear difference was found between the theoretical scattering pattern and the experimental data with a reduced 2 value of 37.1 (Fig. 4C). The poor quality of the fit originates from the large discrepancy between data and fit at high q above 0.15 Å Ϫ1 , where the model underestimates the data.
This additional intensity could possibly originate from fluctuation scattering, which is observed for flexible protein structures. The presence of flexibility in the protein is also supported by plotting the scattering data in a Kratky plot (Fig. 4F), where an upturn is seen at high q values indicative of flexibility. To investigate this possible explanation, the additional intensity . The data in A and C are rescaled to improve visibility of all data. The data are scaled in the following manner: AlgE4R factor 10 0 ; AlgE4A factor 10 1 ; AlgE4 factor 10 2 ; AlgE6R1 factor 10 3 ; AlgE6R2 factor 10 4 ; AlgE6R3 factor 10 5 ; and AlgE6 factor 10 6 . D, best AlgE4 model in red overlay with the remaining nine models (gray semitransparent surface). E, distribution of the R g value from the EOM ensemble before optimization of the ensemble (blue area) and after (red area). The models are shown as an inset where the main part of AlgE6R3 is red and the flexible tail is blue. F, Kratky plot (I(q)q 2 versus q) of AlgE6R3. G, best AlgE6 model in red overlay with the remaining nine models (gray semitransparent surface). The models are overlaid to obtain the best overall superposition of the entire model. This is done using the programs SUPCOMB and DAMAVER.

TABLE 3 Results obtained from the SAXS data by performing an IFT analysis
Molecular weight was calculated from I(q ϭ 0). The calculated molecular weights were determined by use of ProtParam (60).  NOVEMBER 7, 2014 • VOLUME 289 • NUMBER 45 contribution was described by the scattering from a Gaussian chain (49), which increased the fit quality substantially from 2 37.1 to 2.05 (Fig. 4C). Thus, some flexibility should be present in the structure that agrees well with the fact that the R3-module has a 20-residue long flexible tail. To investigate the flexibility of the AlgE6R3 module further, the ensemble optimization method (EOM) (42) was applied. In EOM, a large ensemble of structures were generated, and a subset of these were selected by a generic algorithm to best fit the scattering data. With this method, a good agreement between the model and the scattering data was obtained with a reduced 2 of 0.95 (Fig. 4C). It is clear from the R g distribution of the ensemble before optimization compared with the R g distribution of the ensemble after optimization (Fig. 4E) that the tail is not stretched away from the main body of AlgE6R3, rather it is found close to this, which is perhaps better visualized in the models (Fig. 4E, inset). Investigation of the full-length AlgE6 (Fig. 4G) in solution was performed by rigid body modeling of the SAXS data, as was also done for full-length AlgE4. In addition to the exploration of the module structures with atomic resolution, the 20-residue flexible tail of the AlgE6R3 module was included as dummy residues. The optimization of the modules relative to each other and comparison of multiple runs were performed in the same manner as for AlgE4. Only one population of structures was found with an NSD of 1.08, and the most representative model yields a good fit with a reduced 2 value of 1.58 (Fig. 4C). A comparison of the SAXS data with a model for AlgE6 in a fully stretched state was also done by calculating the theoretical scattering by CRYSOL of such a model. From the fit quality, it is evident that this structure cannot describe the experimental scattering data satisfactorily, yielding a reduced 2 value of 20.6 (data not shown).

R-modules in Alginate Epimerases
Line Width-Dynamic behavior of molecules was also reflected in NMR data, like in the line width and intensity of the peaks. The slower the molecule reorients in solution the broader peaks are observed the spectrum, which is also known as line broadening. The line widths of 10 peaks belonging to the A-module or to the R-module, respectively, in a uniformed 2 H, 15 N-AlgE4 (21) were measured. For comparison, the line width of the same 10 peaks of the A-module was measured in U-2 H, 15 N-AlgE4A. The average line width of the peaks from the R-module in AlgE4 is 12.58 Ϯ 1.82 Hz and is significantly narrower than from the A-module in AlgE4 (16.67 Ϯ 3.06 Hz), supporting the hypothesis that the modules have a flexible linker between them. Another indication of a flexible linker between the A-and the R-module in the whole AlgE4 is the intensity of the peaks. The volume and height of a peak from the R-module is bigger than from the A-module, also indicating that the A-and R-modules relax with different rates (Fig. 5). In silico analysis of the linker region between the A-and R-modules of AlgE4 predicted no secondary structure elements (50,51), and the linker region is relatively proline-rich (GEPGATPQQPST).
Alginate Binding Studies-Previously, it has been shown that both the A-and R-modules of AlgE4 bind oligomeric mannuronate (13,15,52). In this study, this was investigated in more detail, performing binding studies between AlgE4R and alginates that varied in length and M/G composition by NMR and ITC. Chemical shift perturbations of individual amino acids obtain for AlgE4R during interaction with different alginate oligomers using NMR are shown in Fig. 6A. Here, amino acids that are significantly affected by interaction with a given alginate oligomer (M 3 -M 6 , MG 4 , and MG 7 ) are indicated along the amino acid sequences of the R-module. Most of these amino acids are located in three clusters for AlgE4R. The first cluster Right, crosssection through a representative peak from the A module (left) and R module (right), respectively. The peak belonging to the R-module is approximately three to four times higher than the peak from the A-module and is significantly narrower. This shows that the two modules have some degree of conformational freedom. Neither SDS-PAGE nor MS-data (data not shown) show any degradation of AlgE4 into the A-and R-module in the NMR sample used. covers residues 38 -43, and the second cluster covers residues 61-71. The last cluster is located between amino acids 101 and 136; however, not all amino acids are affected. The length and type of alginate oligomer have little influence on which of the amino acids that is most affected by interaction. The surface area of AlgE4R that is affected by interaction with alginate oligomers is visualized in Fig. 6B. Most of the amino acids that experience major shift changes are clustered on the front side of AlgE4R at the antiparallel ␤-hairpin and the long antiparallel ␤-strands.
Besides the chemical shift perturbation data obtained by NMR, the affinity of different alginate oligomers for AlgE4R was determined by ITC. The dissociation constants and thermodynamic data are summarized in Table 4 and shown in Fig.  7. Both NMR and ITC data are similar and show that the dissociation constant decreases with increasing length of the alginate oligomer. For M8 and higher, the ITC data suggested a binding ratio of 1:2, where two AlgE4R can bind to one oligomer chain. Because of high dissociation constants for the interaction between AlgE4R and MG oligomers, the ITC measurements could not be performed under optimal conditions, which inevitably lead to high uncertainty in the obtained results. However, the data clearly shows that the R-module of AlgE4 has higher affinity for poly-M than for poly-MG with the same degree of polymerization. Titration of AlgE4R with poly-G oligomers did not reveal any measurable interaction. The thermodynamic parameters (Table 4) for the AlgE4-alginate oligomer interaction clearly show an enthalpy-driven association with an entropy penalty from binding the oligomer.
Oligo-M, MG, and G alginates were tested with each individual R-module of AlgE6 by either NMR or ITC. These data revealed that none of the single R-modules of AlgE6 binds to any alginate. This was very surprising because the sequence and structural identity between AlgE4R and the R-modules of AlgE6 are very high. As a consequence of these results, a new construct containing all three R-modules (R1R2R3) from AlgE6 was made. ITC titration of R1R2R3 with M oligomers shows a weak interaction with M10 that grew stronger with increased oligomer length (Fig. 7A and Table 4). Similar ITC results have been obtained for ORF-9, which is a protein of unknown function from A. vinelandii consisting of seven R-modules (Fig. 7A) (53). As it is difficult to prepare pure homo-oligomers with a degree of polymerization above Ϸ12, the obtained ITC values should be viewed as a general trend for the interaction between protein and alginate. The thermograms for titration of MG13-14 with R1R2R3 showed only a vague indication of an interaction, and therefore, further attempts were not under-FIGURE 6. Chemical shift perturbation patterns of alginate oligomer binding on the AlgE4R module measured by NMR. A, amino acids residues that experience a chemical shift change ͉⌬␦ bound ͉ of Ͼ100 Hz are plotted as a dot along the primary amino acid sequence for the alginate oligomers that were used in the experiments. When more than one amino acid in a row experiences chemical shift changes, it results in bar, where the length is proportional to the affected area. The alginate oligomer type and length has hardly any influence on the location of the perturbation. B, amino acid residues with a chemical shift change ͉⌬␦ bound ͉ of Ͼ100 Hz for the interaction with hexameric M are highlighted in blue on the backbone ribbon structure of the AlgE4R-module.

TABLE 4 Dissociation constants and other thermodynamic data for the R-module from AlgE4 determined at 25°C
Calculations from NMR data were done assuming a 1:1 (n ϭ 1) complex. Nearly all the ITC data could be fitted to n ϭ 1. M8 and M9 could not be fitted with one binding event.  NOVEMBER 7, 2014 • VOLUME 289 • NUMBER 45

R-modules in Alginate Epimerases
taken. Similar to AlgE4R, the thermodynamic parameters for the interaction of R1R2R3 with alginate oligomers indicate an enthalpy-driven association with an entropy penalty. Construction of New Alginate Epimerases-Based on both ITC and NMR data, it can be concluded that the R-module of AlgE4 has stronger interactions with poly-M and poly-MG than the AlgE6 R-modules. This means that the average contact time per productive binding between enzyme and substrate increases as AlgE4R could stabilize the enzyme-substrate complex. In principle, this would lead to a higher number of sugar units epimerized per enzyme-substrate contact given the assumption that the reaction rate for one epimerization is similar for both AlgE4 and AlgE6. This hypothesis was tested by swapping of the R-modules between AlgE4 and AlgE6. In prin-ciple, this should for the combination of the AlgE6 A-module with the AlgE4 R-module (referred to as AlgE64) result in an enzyme that is able to introduce more G-blocks than AlgE6 as the interaction time per productive binding should increase due to the higher alginate affinity of AlgE4R. For an enzyme composed of the A-module from AlgE4 and the R-modules from AlgE6 (AlgE46), the result should be production of MGblocks (as for AlgE4) but with a lower overall G-content than AlgE4. The extra length of the R-modules in AlgE46 might require a longer stretch of the substrate to make productive binding. The alginate block composition of poly-M epimerized with AlgE4, AlgE46, AlgE6, and AlgE64 are given in Table 5. Indeed, the end point epimerization results demonstrate clearly that AlgE64 leads to introduction of a higher level of G-blocks than AlgE6 and that AlgE46 gives alternating MG structures but with a lower total G-content.

DISCUSSION
Structure and Shape-The overall structures for the R-modules of AlgE4 and AlgE6 are nearly identical, being a result of the high primary sequence identity for the four R-modules. There is also a high sequence and structure similarity between the initial ␤-roll of the R-modules and the RTX domain of the metalloproteases from Pseudomonas aeruginosa (PDB code 1KAP) and Serratia marcescens (PDB code 1SAT), which coordinate one Ca 2ϩ ion between two neighboring loops in the ␤-roll (54,55). It has been shown by radioactive assay with 45 Ca (16) that the R-modules bind Ca 2ϩ ions. We incorporated the geometry from these Ca 2ϩ -binding motifs into the structural calculations as described for AlgE4R (15). Introducing the Ca 2ϩ ions into the R-modules of AlgE6 structures assuming they are bound in the same way as in AlgE4R did not change the structures or introduce additional violations (distance constraints and van der Waals repulsions) for the calculated structures. This indicates that binding of Ca 2ϩ ions in these sites of R-modules is indeed possible.
The SAXS measurements allow for extracting low resolution structure data for AlgE4 and AlgE6, these being the first structural models of full-length extracellular alginate epimerases. Based on a rigid body model, full-length AlgE4 seems to have an elongated shape, where the R-module continues as an extension of the A-module. The model also shows that certain molecular flexibility is possible between the A-and R-modules of AlgE4. This correlates well with NMR data of the full-length AlgE4 showing that the average line width of the A-module is 16.67 Ϯ 3.06 Hz, whereas the corresponding values for the R-module is 12.58 Ϯ 1.82 Hz, thus suggesting some degree of orientational flexibility between the two modules. The rigid body model of AlgE6 showed that the modules have a defined orientation relative to each other, and to some extent R-mod- FIGURE 7. Results for interaction between R-modules and alginate obtained with ITC and NMR. The graph shows K d obtained from the ITC (A) and NMR (B) measurements. In general, MG oligomers have higher dissociation constants than poly-M at the same degree of polymerization. NMR data fit well to the data obtained by ITC. K d depends strongly on the degree of polymerization.

TABLE 5 Distribution of M and G units obtained from end point epimerization of poly-M
Monomers, diads, triads, and average G-block lengths were calculated based on 1 H NMR spectra obtained from poly-M epimerized with AlgE46 or AlgE64. ules continue as a prolongation of the A-module. Interestingly, the model indicated that the three R-modules do not have a linear orientation but form a bend. Again, the model seems to suggest some orientational flexibility between the modules. The sequences between the R-modules show a high content of prolines, which is known to break secondary structural elements and thus introduce flexibility. It is likely that this flexibility is needed for transport and substrate-product interaction. In total, the results from SAXS and NMR data suggest that the epimerases adopt an overall elongated shape with some flexibility between the individual modules. Alginate Binding-The R-module of AlgE4 shows a clear preference for poly-M alginate over MG alginate and displays a well defined binding groove ( Fig. 6 and 7). Poly-G alginate was not tested, as it could not be dissolved in the buffer conditions used for AlgE4R. The protein needs Ca 2ϩ to retain its fold; however, poly-G alginate also binds Ca 2ϩ leading to alginate gel formation. Attempts to perform ITC measurement in Ca 2ϩfree buffer failed, which is probably due to the structural instability of the R-module from AlgE4 in the absence of Ca 2ϩ (15). The highest binding energies were measured with M5 and longer oligomers ( Fig. 6 and 7). M5 has approximately the same length as the maximal distance between the basic amino acids of AlgE4R (Arg-24 and Lys-103, Fig. 8). Most of the amino acids that are affected by the alginate binding are in the groove with an electropositive surface potential (Fig. 3), and it seems that alginate binds over the entire length of AlgE4R. The fact that M3 to M5 lead to chemical shift perturbations over nearly the same surface area of the R-module, whereas the dissociation constant of the binding decreases by 10-fold with increasing numbers of M subunits, suggests that there is a multiple num-ber of binding sites available and that short alginate oligomers cannot cover all binding sites simultaneously but can freely move between different binding subsites in the alginate binding groove. The longer M8 and M9 show two distinct binding events. It is assumed that at a low alginate concentration, two R-modules bind to one alginate chain. At a higher alginate concentration, each R-module binds to one alginate chain. The ITC experimental data, NMR titration, and structural data clearly show a single binding location in the alginate binding groove. The ITC thermograph for poly-MG shows binding of alginate; however, the software was not able to fit the obtained data due to the low amounts of heat generated by this weak binding. To get better ITC results, the protein concentration should be 50 -200-fold higher, which is not possible. The NMR data also show a weak binding for poly-MG and the obtained binding constant also indicates that ITC titrations are at its limit for poly-MG.
In contrast to the results obtained for AlgE4R, none of the individual R-modules of AlgE6 showed detectable interaction to any alginate oligomer. Weak interaction was measured for the three AlgE6 R1R2R3 modules together with poly-M from M10 to M20, and an interaction with poly-MG was also detected. It is surprising that the affinity for R1R2R3 of AlgE6 is found to be much lower than AlgE4R as the amino acid sequence and also the amount and distribution of charged and polar amino acids are very similar in the assumed binding site (Figs. 1 and 3). Similar results were obtained with the protein ORF-9 (53), which consists entirely of seven R-modules and shows comparable affinity for alginate oligomers as R1R2R3. It should be noted that the R-modules in ORF-9 are more similar to the R-modules of AlgE6 than to AlgE4. The results presented FIGURE 8. Charged amino acids on the assumed alginate-binding surface. A, AlgE4R; B, AlgE6R1; C, AlgE6R2; and D, AlgE6R3. Each R-module has 5-7 basic amino acids (in blue) and 5-7 acidic amino acids (in red) within the assumed alginate binding region of the surface. AlgE4R has the more basic amino acid residues evenly distributed over the assumed alginate-binding surface than the R-modules of AlgE6.
here show that minor differences in the alginate binding groove have a huge impact on the affinity for alginate.
Alginate-binding Site-To explain the different alginate binding behavior of the four R-modules despite their structural similarity and sequence homology, a more detailed assessment of the charge distribution is required. In particular, the R-modules of AlgE6 are very similar to each other in the amino acid sequence but vary a bit in their charge distribution on the surface (Fig. 8). The putative alginate-binding site of AlgE4R contains eight basic amino acids. Of these eight amino acids, three are not conserved in all four R-modules (Arg-24, Lys-103, and Arg-124). Lys-103, one of the nonconserved basic amino acids in AlgE4R, is relatively far away from any other positively charged amino acid (ϳ9 Å from Arg-110 and Lys-114). However, Arg-124 and Arg-24 (the other two nonconversed basic amino acids) are close to other basic amino acids. Arg-24 is close to Arg-40 and Arg-42, whereas Arg-124 is surrounded by Arg-62 and Lys-114. Arg-24 introduces a positive surface potential on a spot, where AlgE6R2 and AlgE6R3 have a negative surface potential. Arg-124 forms a kind of bridge between Arg-62 and Lys-114, but the positive charge is shielded by negatively charged side chains of the surrounding amino acids (Glu-64, Asp-94, Glu-117, Asp-119, Glu-121, and Glu-126).
All R-modules of AlgE6 have one additional arginine located between two phenylalanines that open the ␤-roll (FRF (53)(54)(55) in AlgE6R1 and residues 63-65 in AlgE6R2 and AlgE6R3). These extra arginines are, however, on the opposite side of the proteins relative to where the binding site is located in AlgE4R. Besides this arginine, AlgE6R1 has one extra arginine (Arg-26) that is at the same position as Arg-24 in AlgE4R. Arg-26 is flanked by two aspartic acids (Asp-24 and Arg-28). The charge distribution is very similar to AlgE4R, although AlgE6R1 does not have such a strong negatively charged groove on the bottom because both nonconserved positions contain amidic instead of acidic amino acids. In the case of AlgE6R2, the charges are more equally distributed, and there are no areas on the electrostatic surface that have a strongly positive or strongly negative potential. Although AlgE6R2 and AlgE6R3 have very similar sequences, the electrostatic surfaces are very different. Most pronounced is the electropositive area on the front side of AlgE6R3. It consists of the four basic amino acids, Arg-51, Lys-53, Arg-82, and Arg-121. Arg-82 is the only residue that is not conserved in all four R-modules. In the other modules, there is a leucine at this position. The distance between Arg-82 and Arg-51 is about 4.5 Å, and Arg-82 and Lys-53 are about 8 Å away from each other. In particular, the distance between Arg-51 and Arg-82 is extremely close considering that both are positively charged. Additionally, Arg-121 is also relatively close to Arg-82 with 8 Å. In the other R-modules, Arg-121 is far distanced from any other basic amino acid.
Besides the basic residues, the acidic and polar amino acids also have to be considered. Many of the acidic and polar amino acids on the front side of the R-modules are also conserved (in AlgE4 they are Asp-74, Glu-107, Asp-119, Glu-126, and Ser-44, and Tyr-96, Asn-105, and Tyr-112). Both groups of amino acids can bind alginate through hydrogen bonds and probably stabilizing the proteins by avoiding electrostatic repulsion of the basic residues that might otherwise cause labile protein struc-tures. A recent study shows that the presence of a negative charge at the substrate binding groove had a significant effect on the ability of an epimerase to move in a well defined manner (20). This points out that if the binding groove only had basic residues, the alginate polymer would not be able to dissociate from the enzyme surface and/or the epimerase could not move along the alginate polymer chain.
Roles of the R-module(s)-It has previously been shown that although the R-modules show no epimerization activity, they are able to enhance the activity of the A-module (10)(11)(12)(13)(14)(15)(16). Swapping the R-modules between AlgE4 and AlgE6 shows that they also have a strong influence on the epimerization pattern of the epimerase, which was also observed in a mutant epimerase (20). Improving the alginate binding ability of AlgE6A by introducing the AlgE4R module, creating the AlgE64 epimerase mutant, leads to increased G-block formation. However, it is also possible that there is an upper limit for improving the binding ability and thereby the G-block forming capability. If the interaction between enzyme and substrate becomes too strong, it will reduce the catalytic rate of the epimerase.
In the case of AlgE4, the R-module binds to the poly-M and poly-MG oligomers with different affinity. It can be assumed that the charged and polar amino acids contribute strongly in directing the binding and orientation of the alginate on the epimerase surface both before and after each epimerization reaction. The attraction and repulsion between charges can help to move the whole epimerase on the alginate polymer forward in a processive mode of action. The R-module is following after the A-module meaning that in the beginning the R-module may bind to poly-M alginate, but after some epimerization steps, the R-module comes into contact with the MG-blocks that are produced. The R-module does not bind that well to the alginate anymore and the whole AlgE4 detaches. This model fits well with experimental data obtained by Campa et al. (35), which observed that AlgE4 makes ϳ10 epimerization steps for each association event with the alginate polymer.
Concluding Remarks-In this study, we have determined the three-dimensional solution structure of the three individual R-modules from AlgE6 with NMR spectroscopy. In general, they all contain ␤-sheets folded into an elongated roll with a positively charged groove along the long axis of the protein on one side. Calcium ions can be incorporated into the loops of the ␤-roll without an increase in the target function for the structure calculation. SAXS analyses of AlgE4 and AlgE6 show that both enzymes display an elongated shape with some flexibility between the individual modules. This was supported by analysis of line width based on the selected resonances in the 15 N-TROSY spectrum obtained of the A-and R-module for intact AlgE4. AlgE4R binds alginate, whereas the individual R-modules from AlgE6 are not able to interact with alginate. Swapping the R-modules between AlgE4 and AlgE6 resulted in a novel epimerase called AlgE64 displaying better G-block forming abilities than AlgE6. This property seems to be correlated with the better alginate binding ability of the AlgE4R module than for the AlgE6R modules. Furthermore, AlgE4R has a higher affinity for poly-M than poly-MG oligomers, which correlates well with the mode of action and degree of processivity of AlgE4. Altogether, the R-modules influence and modulate the epimerase interaction with alginates and the epimerization pattern.