NMR Structure of the R-module

In the bacterium Azotobacter vinelandii, a family of seven secreted and calcium-dependent mannuronan C-5 epimerases (AlgE1-7) has been identified. These epimerases are responsible for the epimerization of β-d-mannuronic acid to α-l-guluronic acid in alginate polymers. The epimerases consist of two types of structural modules, designated A (one or two copies) and R (one to seven copies). The structure of the catalytically active A-module from the smallest epimerase AlgE4 (consisting of AR) has been solved recently. This paper describes the NMR structure of the R-module from AlgE4 and its titration with a substrate analogue and paramagnetic thulium ions. The R-module folds into a right-handed parallel β-roll. The overall shape of the R-module is an elongated molecule with a positively charged patch that interacts with the substrate. Titration of the R-module with thulium indicated possible calcium binding sites in the loops formed by the nonarepeat sequences in the N-terminal part of the molecule and the importance of calcium binding for the stability of the R-module. Structure calculations showed that calcium ions can be incorporated in these loops without structural violations and changes. Based on the structure and the electrostatic surface potential of both the A- and R-module from AlgE4, a model for the appearance of the whole protein is proposed.

are more flexible without the ability to form gels with cations (4,5). The relative amounts and distribution of M and G vary extensively among different species of both brown algae (6 -8) and bacteria (9,10) Alginate is for brown algae essentially the same as cellulose is for trees and plants. The stripes of the algae are mainly G-rich, giving them the stiffness to work as a skeleton, whereas leaves are M-rich showing flexibility (11).
In Azotobacter sp., alginate is used as a capsular polysaccharide (12,13), likewise in Pseudomonas sp. (14 -16). In Azotobacter vinelandii, the composition and function of the alginates vary significantly depending on the environmental conditions.
Vegetatively growing cells produce alginates that form a loose capsule structure that is easily released into the growth medium. These alginates are typically M-rich (17), but under certain conditions of environmental stress, the cells enter a resting stage designated the "cyst" stage. The cysts are surrounded by a rigid alginate-containing protective coat, and due to the expression of a family of mannuronan C-5 epimerases (the AlgE family, see next paragraph), this coat also contains alginates with GG-blocks and strong gel-forming properties (17,18).
In all known alginate-producing organisms, the alginates are modified at the polymer level by mannuronan C-5 epimerases converting Mto G-residues (19). The periplasmic epimerase AlgG is a part of alginate biosynthesis and is conserved in all known alginate-producing bacteria (20). Recently, the AlgG epimerase structure from Pseudomonas aeruginosa was suggested to contain a right-handed ␤-helix (21). This protein has, in addition, been shown to be essential for the protection of the alginate polymer against degradation by an alginate lyase, AlgL (22). Additionally, the bacterium A. vinelandii encodes a family of seven secreted and calcium ion-dependent mannuronan C-5 epimerases (AlgE1-7) (23). These epimerases consist of two types of structural modules, designated "A" (ϳ385 amino acids each, with one or two copies) and "R" (ϳ155 amino acids each, with one to seven copies), and additionally, a C-terminal signal peptide in the last of the R-modules (24,25). The A-modules alone are catalytically active, but their reaction rates are significantly increased when covalently bound to at least one R-module (26). The R-module of AlgE4 is not catalytically active on its own (26). The N-terminal region of each R-module consists of four to seven copies of a nonapeptide with the consensus sequence LXG-GAGXDX, which is involved in calcium binding (24,26). The individual sequence, number, and distribution of A-and R-modules give the individual epimerases their characteristic modes of action on this substrate. The R-module appears to have a significant role in the reaction catalyzed by the A-module by reducing the Ca 2ϩ concentration needed for full activity and by enhancing the reaction rate (26). Recent experiments have also shown that the R-module influences the product specificity of the attached A-module, i.e. the degree of GG-or MG-blocks produced (27).
Therefore, it is essential to study the structure of such an alginate epimerase and its modules in detail to achieve insight into its function. Knowledge of the three-dimensional structure of these proteins opens a range of further investigations concerning the metal binding properties and the substrate binding of alginate epimerases. Of special interest is the role of the R-module(s) after secretion and its/their influence on the function of the A-module(s).
Unsuccessful attempts have been made to crystallize the smallest epimerase (AlgE4 consisting of one A-and one R-module). However, the A-module has been produced and crystallized separately, and recently the structure of this module was solved by x-ray crystallography (28). The A-module belongs to the family of pectate lyase folds. It folds as a single-stranded, right-handed parallel ␤-helix. This ␤-helix consists of 12 complete turns, made up of four ␤-strands each, except for the last turn at the C terminus, where the helix narrows down to three ␤-strands per turn. An N-terminal amphiphatic ␣-helix forms a cap to this part of the protein (28). Here, we report the structure of the R-module containing the C-terminal signal sequence and its possible calcium and alginate binding sites.

MATERIALS AND METHODS
Sample Preparation-Cloning, production, and purification of the uniformly 13 C/ 15 N-labeled R-module (167 amino acids) and the sample conditions for the NMR measurements were described previously (29). Samples for NMR studies contained 1.6 mM R-module in 20 mM HEPES buffer with 50 mM CaCl 2 at pH 6.9 in 95% H 2 O/5% D 2 O or 99.9% D 2 O. All NMR experiments were recorded at 298 K on a Bruker DRX600 spectrometer equipped with a 5-mm xyz-gradient TXI(H/C/N) probe.
NMR Spectroscopy-The sequence-specific resonance assignments and the triple resonance NMR experiments used to obtain them are described in Ref. 29. Three-dimensional 13 C-and 15 N-edited 1 H, 1 H-NOESY spectra were recorded in D 2 O and H 2 O, respectively. The 15 N-1 H heteronuclear NOEs were calculated from two independently measured and integrated 15 N-HSQC spectra as the ratio of the peak volumes with and without 1 H saturation. Nuclear magnetic relaxation (T 1 and T 2 ) measurements of 15 N were obtained by exponential fitting of the peak intensities in 15 N-HSQC spectra acquired with different relaxation delays (30,31).
Metal Binding-The displacement of calcium ions in the R-module by paramagnetic thulium was achieved by adding small amounts of the aliquots of a 4.7 mM TmCl 3 solution for the first three steps until an R-module:calcium:thulium molar ratio of 1:25:0.5 was reached. Thereafter, aliquots of 47 mM TmCl 3 were added for the next thirteen steps until a (R-module:calcium:thulium) molar ratio of 1:25:10.5 was reached. Finally, aliquots of 470 mM TmCl 3 solution were added in three steps to a final ratio of (R-module:calcium:thulium) 1:25:30.75. In all titration steps, thulium was added directly to the uniformly 15 N-labeled R-module solution in the NMR sample tube in small volumes so that the increase in sample volume could be neglected. A 15 N-HSQC spectrum was recorded for each of the 19 titration steps.
Chemical shift changes in two-dimensional HSQC spectra were quantified by calculation of an absolute change in chemical shift as, where ⌬ H is the chemical shift change of H N atoms expressed in Hz, and ⌬ H is the chemical shift change of N atoms expressed in Hz.
Alginate Binding-The interaction between the R-module and alginate was investigated by adding small amounts of a 10 mM solution of pentameric ␤-D-mannuronic acid (M 5 ), dissolved in a buffer identical to the NMR sample, to a solution of 0.15 mM U-15 N-labeled R-module in 50 mM CaCl 2 , 20 mM HEPES, pH 6.9. M 5 was added in six portions until an R-module:alginate molar ratio of 1:1.2 was reached. In all titration steps, alginate was added in small volumes so that the increase in sample volume could be neglected. A 15 N-HSQC spectrum was recorded for each titration step, and spectral changes were quantified according to Equation 1.
Structure Calculation-NOESY cross-peaks were identified, integrated, and assigned in the aforementioned NOESY spectra using the program XEASY (32). The CALIBA (33) subroutine in CYANA was used to convert cross-peak intensities from NOESY spectra into distance constraints. Backbone torsion angle restraints were obtained from secondary chemical shifts using the program TALOS (34). On the basis of this input, the structure was calculated using the torsion angle dynamics program CYANA (35). Weak restraints on (,) torsion angle pairs and on side chain torsion angles between tetrahedral carbon atoms were applied temporarily during the high temperature and cooling phases of the simulated annealing schedule to favor allowed regions of the Ramachandran plot and staggered rotamer positions, respectively. Structure calculations were started from 100 conformers with random torsion angle values. The 20 conformers with the lowest final CYANA target function values were embedded in a water shell of 8 Å thickness and energy-minimized against the AMBER force field (36) with the program OPALp (37).
Incorporation of Calcium Ions into the Structure Calculation-Additional structure calculations were performed with calcium ions incorporated into the structure. The calcium binding loops were determined from the thulium titration experiments. The binding geometry was inferred from alkaline metalloproteases from P. aeruginosa (Protein Data Bank (PDB) code 1KAP) and Serratia marcescens (PDB code 1SAT) (38,39), two proteins with highly similar calcium binding motifs.
The calcium ion van der Waals radius was set to 1.12 Å (40). The positions of the Ca 2ϩ ions were restrained by upper limit distance constraints between the respective Ca 2ϩ ions and protein atoms using the calcium binding ␤-roll motifs from P. aeruginosa and S. marcescens (38,39) as templates. One-half of the octahedral coordination is made up by the backbone oxygen atoms of G 2 (2.5 Å upper distance limit), G 4 (2.5 Å), and the side chain oxygen O␦ 1 of D 6 (2.5 Å). The coordination sphere is completed by the backbone oxygen atoms from G 1 Ј (2.5 Å) and X 3 Ј (2.5 Å) and the side chain oxygen O␦ 2 of D 6 Ј (3.2 Å) of the neighboring GGXGXD loop. To define the two last calcium binding sites, only distance constraints to the first half of the coordination sphere and D 6 Ј were incorporated, because it was found in the x-ray structure of P. aeruginosa that the rest of the coordination sphere was made up by water ligands (39). Our structure calculations including Ca 2ϩ ions are not intended to serve as a direct determination of the structural details of calcium binding, but merely to demonstrate the compatibility of our data with the known calcium binding mode of the nonapeptide motif.

RESULTS
The solution structure of the R-module was calculated on the basis of NOE upper distance limits and TALOS-derived (34) backbone torsion angle restraints ( Table 1). The coordinate and constraint files were deposited in the PDB data base (accession code 2AGM). Fig. 1 shows the sequence and the location of the secondary structure elements, whereas Fig. 2, A and C, show the three-dimensional structure. The structured N-terminal end of the R-module is a ␤-roll defined by three amino acids forming a short ␤-strand and six amino acids forming a loop to the next short ␤-strand followed by six amino acids forming the next loop, resulting in an 18-amino-acid ␤-roll unit. This is reflected by 19.5% of all long range NOEs being observed between atoms that are 18 residues apart in the sequence. This structure is repeated three times (␤ 1 6 -8, ␤ 2 15-17, ␤ 3 24 -26, ␤ 4 33-35, ␤ 5 42-44, ␤ 6 51-53) making up three turns of the ␤-roll (Figs. 1 and 3). Thereafter, a loop-out forms an anti-parallel ␤-hairpin (␤ 7 61-63, ␤ 8 66 -68). The polypeptide chain folds back and makes a new turn elongating the ␤-roll (␤ 9 71-73, ␤ 10 81-83). Thereafter, a longer loop bulges out, followed by a less well defined region (91-100). The structure ends with two antiparallel ␤-strands (␤ 11 100 -104, ␤ 12 111-114). The last ␤-strand (␤ 13 127-129) is situated in between the roll and the anti-parallel ␤-sheet (␤ 11 -␤ 12 ), parallel to the first and anti-parallel to the latter. Finally, the polypeptide again forms a somewhat less well ordered loop structure from residue 130 onward. From amino acid residue 145 to the C termi-nus (residue 167), no ordered secondary structure exists. This is in agreement with heteronuclear NOE measurements (data not shown) pointing at a more mobile polypeptide chain.
Remarkably, only 27% of the amino acids of the R-module are in regular secondary structure elements (17% in parallel and 10% in antiparallel ␤-strands), whereas 73% of the residues form a coil structure. This is corroborated by circular dichroism measurements (results not shown). Altogether, the protein forms an elongated structure along the axis of the ␤-roll with a small groove at the front side. The electrostatic surface potential of the R-module shows a positively charged patch formed by arginine and lysine side chains along the small groove at the front side of the ␤-roll. Aspartic acid and glutamic acid residues form negatively charged patches along the turns of the ␤-roll (Fig. 2D, right).
To obtain further information about the intramolecular dynamics of the R-module, 15 N-1 H-NOEs as well as T 1 and T 2 of 15 N were measured (data not shown). The 15 N-1 H-NOEs show increased flexibility for the N-terminal residues 1-7 and a highly mobile chain from residue 145 onward. The same tendency is reflected in the 15 N T 1 and T 2 relaxation times. For the structured part of the R-module, the T 1 :T 2 ratio is a direct measure of the correlation time of the overall rotational tumbling of the molecule. The average T 1 :T 2 ratio for the protein was 12.1 Ϯ 1.1, which, assuming a spherical particle, corresponds to an overall rotational correlation time c ϭ 10.7 Ϯ 0.5 ns (30).
The R-module has been shown to bind calcium ions (26). To obtain further information about calcium binding sites, it was attempted to prepare a metal-free apo-form of the R-module by adding EDTA. However, this resulted in immediate precipitation, so that no sample of apo-R-module could be obtained. The importance of Ca 2ϩ ions for the stability of the R-module is also underlined by the fact that NMR samples of the R-module at 5 mM CaCl 2 showed visible signs of degradation in the NMR spectra after 48 h at room temperature, whereas the NMR samples at 50 mM CaCl 2 were stable for months. Thus, because an apo-R-module was beyond reach, a   titration was performed substituting Ca 2ϩ ions with a paramagnetic ion. We chose the trivalent lanthanide ion, thulium, which has a very similar ionic radius as calcium and therefore can be expected to cause only minor structural changes (41,42). Titration with thulium resulted in chemical shift changes and vanishing signals. The results are summarized in Figs. 2E and 3. The chemical shift changes observed during thulium titration are more pronounced for the amide protons than for the backbone nitrogen atoms, which argues for minimal structure changes, because 15 N shifts are more sensitive to small structural changes (43). These changes are mainly associated with amino acid residues situated in the loops of the ␤-roll and in the vicinity of negative charges on the surface of the protein. Thulium ions also bind to the negatively charged residues in the C-terminal disordered coil region of the R-module. Attempts to achieve a total replacement of calcium with thulium were performed by first unfolding the R-module in urea, followed by refolding in a buffer containing thulium ions instead of calcium. This resulted in precipitation regardless of whether urea was diluted slowly or quickly out of the sample (results not shown). Also, during titration of the R-module with thulium, the protein precipitated at a thulium concentration of ϳ6 mM. Again, the presence of calcium proved to be vital for maintaining the structural stability of the R-module.
Titration of the alginate pentamer M 5 to the R-module resulted in specific changes in the spectra. The chemical shift changes occurring upon titration with M 5 are plotted against the sequence in Fig. 4, whereas Fig. 2F shows the location of the amino acids with changing chemical shifts on the surface of the R-module. The binding event must be rather strong, because chemical shift changes essentially stopped after having reached a 1:1 molar ratio of protein and M 5 . Not surprisingly, chemical shift changes upon titration with M 5 occurred in the vicinity of the patch of positively charged amino acids on the surface of the R-module.

DISCUSSION
Here we have described a study of the R-module, which is one of two types of structural modules forming the family of seven secreted mannuronan C-5 epimerases in A. vinelandii. We have solved the NMR structure of the R-module from AlgE4 containing a C-terminal signal sequence. This structure opens a rational understanding of the secretion of AlgE4 and its function in relation to the A-module. Calcium is important both for the enzymatic activity of the epimerases and for gel formation in alginates. Therefore, the metal binding property has been investigated using the para- magnetic cation Tm 3ϩ to observe possible calcium binding sites in the protein. The structure of the R-module belongs to the class of all-␤ proteins and shows a rare, right-handed parallel ␤-roll. Similar structures have only been found in the C-terminal part of the metalloproteases from P. aeruginosa psychrophilic P. aeruginosa, S. marcescens, and Serratia sp E-15 (SMP_E-15) (PDB codes 1KAP, 1G9K, 1SAT, and 1SRP) (38,39,44,45). The N-terminal ␤-roll of the R-modules contains a characteristic motif of nonapeptide repeats with the consensus sequence LXGGAGXDX n (24), which is nothing but a cyclic permutation of the consensus sequence GGXGXDXUX n (U ϭ a large hydrophobic amino acid) that Baumann et al. (39) found in the ␤-roll of the metalloproteases. This nonapeptide repeat motif was originally found in a number of toxins from different Gramnegative bacteria. It was denoted RTX (repeats in toxin) (46,47). Later, it had also been found in other non-toxin related proteins, such as metalloproteases, iron-regulating proteins, and lipases, including I.3 lipases (48,49). A common feature for these RTX motif-containing domains is that they are located after the N-terminal catalytic domain and that they have a C-terminal secretion signal for translocation through a type I secretion system or ATP binding cassette transporter (48,50,51). Mapping the consensus sequence of the ␤-roll onto the structure, the XUX motif forms the ␤-strand. The large hydrophobic amino acids (U) are buried in the hydrophobic core of the ␤-roll, where they manifest themselves by numerous NOESY cross-peaks between the protons of these side chains. The large hydrophobic amino acid is leucine in the first five ␤-strands and phenylalanine in the sixth ␤-strand (52). Thereafter follows Phe-54, which breaks up the tightly packed ␤-roll. Subsequently, only one more turn is added to the ␤-roll. This is also observed in the x-ray structures of the metalloproteases. The introduction of ␤-roll-breaking aromatic side chains can be a way to control or stop the elongation of the ␤-roll.
The GGXGXD motif forms the loop between the two ␤-strands in the  two-layered ␤-sandwich. The loops have an interesting pattern of NOEs between H N and H ␣2,3 of G 1 to H ␦ or H ⑀ of U 8 and from H ␤ of D 6 to H ␦ or H ⑀ of U 8 . The side chain of D 6 is buried in the ␤-roll, and the backbone carboxyl oxygen atoms from G 2 , G 4 , G 1 Ј and X 3 Ј are oriented toward the interior of the loop. Such buried negative charges in a protein are energetically highly unfavorable. The electron-rich environment as well as the presence of a cavity between the loops suggests a metal binding site, as a metal binding would neutralize the unfavorable unpaired charge in the hydrophobic core. Thulium titration data show calcium replacement in the loops between residues 8 -14, 26 -32, 45-50, and residue 80, whereas indication of a weaker interaction was observed for the loops 18 -23 and 36 -41 and residue 69. The calculated structure of a loop region of the R-module was overlaid with a calcium binding loop from the ␤-roll in metalloprotease P. aeruginosa (Fig. 2B). The high structural identity of the loop regions suggests that the R-module binds calcium in essentially the same way as the metalloproteases P. aeruginosa and S. marcescens. The incorporation of calcium ions into the structure calculation of the R-module did only result in a minor increase of the target function ( Fig. 2D; Table 1). Even though this is no direct evidence for the calcium binding mode of the R-module, it shows that a calcium binding mode equal to P. aeruginosa is fully compatible with our experimental data. Exchange of calcium with thulium proved to happen more readily compared with the corresponding substitution in the metalloproteases, where it could not be achieved. Only the two most C-terminal calcium ions at the end of the ␤-roll in the metalloproteases could be replaced. This was ascribed to the fact that water molecules substitute for some of the protein ligands in the coordination of these ions (39). Neither could the calcium ions be exchanged from a synthetic ␤-roll (39,45,52). This dissimilarity could be due to the R-module being a truncated protein. The lack of the N-terminal part compared with the metalloproteases could make the ␤-roll more flexible. The negatively charged patch at one side of the R-module and a similar negatively charged patch on one side of the A-module could function as cation storage (e.g. calcium, barium, strontium) for the AlgE4 epimerase to either conserve the protein integrity and/or facilitate gel formation after epimerization of the alginate, to drive the reaction (53, 54). As already mentioned earlier, the RTX protein family and the R-module contain a C-terminal secretion signal for translocation through ATP binding cassette transporters, where the proteins are secreted across the inner and outer cell membranes into the medium of Gram-negative bacteria. This secretion signal is not cleaved from the mature protein. In the literature (48,55,56), there are several suggestions as to what triggers translocation. For some RTX proteins, it has been suggested that the signal for secretion is located within the 60 amino acids at the C terminus, but no specific sequence responsible for secretion has been identified. Hui et al. (55) suggest that the recognition motif could be a sequence of secondary structure elements consisting of an amphiphilic ␣-helix followed by a charge-rich linker (8 -10 residues) and another amphiphilic ␣-helix-positioned ϳ45 amino acid residues from the C terminus. Such structure elements are not found in the R-module and there is no indication of other secondary structure elements in the C-terminal part of the protein. It has even been suggested that a general signal for translocation does not exist for these RTX proteins (48). In the R-module of AlgE4, the 22 C-terminal amino acid residues 145-167 have a disordered conformation in solution. In the AlgE epimerase family, there are some common features for the C termini. First, there are short stretches of hydrophobic amino acids from residues 142-144, 155-156 (numbering according to the R-module of AlgE4), and at the very C-terminal end (25). Second, there are no positively charged amino acids found in the last 31 amino acids of the random coil C-terminal tail. This indicates the possibility that negative charges or a cation binding seg-ment could trigger translocation through the ATP binding cassette transporter.
The function of the RTX domain in relation to the N-terminal catalytic domain has been proposed to facilitate translocation of larger proteins and to ensure structural integrity after translocation (48,57,58). This does not explain why AlgE2, AlgE5, AlgE6, and AlgE7 have three to four R-modules. Common for them all is that only the C-terminal R-module contains the secretion signal.
No experimental data are yet available on the structure of the whole AlgE4 protein consisting of one A-module followed by one R-module. Both the A-and the R-module exhibit characteristic, positively charged patches on their surfaces. This similarity, seen in the light of the nature of the protein's substrate, a negatively charged polymer, suggests that the whole AlgE4 protein might form a single elongated structure with a long, positively charged patch for substrate binding (Fig. 2D). Also from the point of view of the secondary structure elements, the R-module appears as a natural extension of the A-module. The C terminus of the A-module contains some prolines that might be the reason for the observed narrowing of a four-stranded to a three-stranded ␤-helix. Because 6 of a total of 12 prolines are found in the region between the Aand the R-module, one may speculate that the function of the prolines is to force the ␤-helix into the ␤-roll structure and thus form a transition from the A-to the R-module. Altogether, AlgE4 seems to consist of a peculiar, right-handed parallel ␤-helix structure that is narrowing down to a ␤-roll, a fold not previously reported in literature.
We have seen strong interactions between the R-module and an alginate M 5 -pentamer. This is in contrast to results achieved previously by atomic force microscopy, which showed that the A-module of AlgE4 binds to alginate stronger than the full-length protein, whereas the R-module was not observed to interact directly with the polymer (59). Our NMR data point at the interaction taking place in the vicinity of a cluster of positively charged side chains on the surface of the protein. The significance of these side chains for alginate binding is also underlined by the fact, that some of these positively charged amino acids, Arg-24, Arg-40, Arg-110, and Lys-114, are highly conserved throughout the 25 R-modules of A. vinelandii.
As mentioned, the activity of the A-module is stimulated by the R-module. This might possibly be linked to the alginate binding property of the R-module.
In the absence of a structure of the complete AlgE4 protein, the structures of the R-and A-modules provide vital information on the architecture of this protein. Similar to the A-module, the R-module folds as a parallel right-handed ␤-roll, and it is the first structure solved with NMR spectroscopy having this rare fold. It was demonstrated that calcium binding is crucial for the structural stability of the R-module. Thulium titration of the R-module has revealed the location of metal binding sites. Metal ions are bound in the loops homologous to other RTX proteins. Calcium binding can be important for epimerization, stabilization, and/or gel formation of alginate. Alginate titration showed specific and strong interactions between the R-module and a pentameric M 5 alginate. Knowledge about the structure and alginate interaction of the R-module provides the basis for better insight into the structure and catalytic activity of the A. vinelandii AlgE1-7 alginate epimerase family.