Evolution of Metal(loid) Binding Sites in Transcriptional Regulators*

Expression of the genes for resistance to heavy metals and metalloids is transcriptionally regulated by the toxic ions themselves. Members of the ArsR/SmtB family of small metalloregulatory proteins respond to transition metals, heavy metals, and metalloids, including As(III), Sb(III), Cd(II), Pb(II), Zn(II), Co(II), and Ni(II). These homodimeric repressors bind to DNA in the absence of inducing metal(loid) ion and dissociate from the DNA when inducer is bound. The regulatory sites are often three- or four-coordinate metal binding sites composed of cysteine thiolates. Surprisingly, in two different As(III)-responsive regulators, the metalloid binding sites were in different locations in the repressor, and the Cd(II) binding sites were in two different locations in two Cd(II)-responsive regulators. We hypothesize that ArsR/SmtB repressors have a common backbone structure, that of a winged helix DNA-binding protein, but have considerable plasticity in the location of inducer binding sites. Here we show that an As(III)-responsive member of the family, CgArsR1 from Corynebacterium glutamicum, binds As(III) to a cysteine triad composed of Cys15, Cys16, and Cys55. This binding site is clearly unrelated to the binding sites of other characterized ArsR/SmtB family members. This is consistent with our hypothesis that metal(loid) binding sites in DNA binding proteins evolve convergently in response to persistent environmental pressures.

Arsenic, a toxic metalloid, is currently and has always been ranked first on the Superfund List of Hazardous Substances (available on the World Wide Web), in part because of its environmental ubiquity. As a consequence, nearly all organisms have genes that confer resistance to arsenic. Environmental arsenic is sensed by members of the ArsR/SmtB family of metalloregulatory proteins (1)(2)(3). These winged helix repressor proteins specifically bind to arsenic and other toxic metals. Consequently, they control expression of genes involved in arsenic biotransformation and efflux. For example, the ArsR repressor encoded by Escherichia coli plasmid R773 binds to the promoter region of its respective ars operon in the absence of As(III) or Sb(III) (4). This homodimeric repressor has the sequence Cys 32 -Val-Cys 34 -Asp-Leu-Cys 37 in the DNA binding domain of each monomer (5,6). The three sulfur thiolates of the cysteine residues form a very specific three-coordinate binding site for the trivalent metalloids As(III) and Sb(III). Binding of metalloids to R773 ArsR is presumed to induce a conformational change, leading to dissociation from the DNA and hence derepression. The Staphylococcus aureus CadC is a Cd(II)/Pb(II)/Zn(II)-responsive member of the ArsR/SmtB family that has four cysteine residues in the inducer binding domain (7). Of these four cysteine residues, two come from one subunit, whereas the other two come from the other subunit of the homodimer (8,9). The position of this metal binding site in CadC is congruent to that of the R773 ArsR but is formed between two monomers. CadC also has a second type of metal binding site (DXHX 10 HX 2 E) for Zn(II) at the dimer interface that is not a regulatory site. This site, however, is identical to the regulatory Zn(II) site of SmtB from Synechococcus PCC 7942 (10). Another member of the ArsR/SmtB family is the ArsR from Acidithiobacillus ferrooxidans (AfArsR), 3 which has three cysteine residues (Cys 95 , Cys 96 , and Cys 102 ) at the dimer interface rather than in or near the DNA binding domain (11). These three cysteine residues form a three-coordinate or S 3 binding site for trivalent metalloids (12). Although both the As(III) binding site of AfArsR and the Zn(II) binding site of SmtB are at the C-terminal dimerization domain, the former is formed of three cysteine residues within a single subunit (two sites per dimer), whereas the latter is formed by four residues, two residues from one monomer and two from the other. Thus, metal(loid) binding sites appear to arise by convergent evolution, even in homologous proteins.
In the present study, we focused on two CgArsRs, which regulate expression of the ars operons of Corynebacterium glutamicum, and their arsenic binding properties. C. glutamicum is a Gram-positive nonpathogenic rod-shaped soil bacterium with high intrinsic resistance to arsenic. Analysis of the arsenic resistance mechanisms in C. glutamicum revealed the presence of two inducible ars operons (ars1 and ars2) that are located distant from each other in the chromosome (13). Expression of both operons is induced by arsenite. Their repressors, CgArsR1 and CgArsR2, which are 66% identical, both have three cysteine residues, two N-terminal residues (Cys 15 and Cys 16 ) that have no counterpart in any of the other characterized members of the ArsR/SmtB family and a third cysteine residue (Cys 55 ) that is near but not within the putative DNA binding domain of the regulatory proteins. The positions of these cysteine residues in the two CgArsRs resemble those of CadC but appear to have arisen independently. Homology modeling of CgArsR1 suggests two As(III) binding sites, each of which is formed by Cys 15 and Cys 16 from one subunit and Cys 55 from the other subunit. The results of in vivo and in vitro studies support this hypothesis. Thus, CgArsR1 provides a fresh instance of convergent and parallel evolution of metal(loid) binding sites built on the framework of a common DNA-binding protein.

MATERIALS AND METHODS
Strains, Media, and Reagents-Escherichia coli cells were grown in LB medium at 37°C. C. glutamicum was grown in Tryptone soya broth (or Tryptone soya broth with 2% agar added) at 30°C (14). Media were supplemented with 100 g/ml ampicillin, 25 g/ml kanamycin, 12.5 g/ml tetracycline, or 25 g/ml chloramphenicol (8 g/ml for C. glutamicum), as required. E. coli strain Top 10 (Invitrogen) was used for plasmid construction and mutagenesis, strain BL21(DE3) (Novagen, Madison, WI), was used for protein expression analysis and strain S17-1 (13) as the donor in conjugation assays. Bacterial growth was monitored as A 600 nm . All reagents were obtained from commercial sources.
Construction of Strains and Plasmids and Mutagenesis-For CgArsR1 and CgArsR2 expression in E. coli, plasmids pETArsR1 and pETArsR2 were constructed; in both plasmids, the arsR genes were under control of the T7 promoter and fused to a C-terminal His tag. Cysteine mutants were constructed with these plasmids.
For in vivo reporter gene expression, a C. glutamicum host strain lacking the two chromosomal ars operons, including arsR1 and arsR2, was constructed. This strain, termed 2⌬ars, was made for incorporation of wild type arsR1 and seven mutants in single copy at the original position on the chromosome based on the use of the mobilizable plasmid pK18mob (15). A bifunctional plasmid containing the original regulatory region from the main operator/promoter site of the ars1 operon (o/p) fused in the arsB direction to the egfp2 reporter gene were used for further in vivo analysis. The methods for strain and plasmid construction and for construction of mutants are given in the supplemental materials.
Purification of Wild Type and Mutant CgArsRs-E. coli BL21(DE3) bearing pETarsR1, pETarsR2, or the seven arsR1 mutants were cultured and induced with 0.3 mM isopropyl ␤-D-1-thiogalactopyranoside. Cells were harvested, washed once with a buffer consisting of 50 mM MOPS-KOH, pH 7.5, containing 20% (v/v) glycerol, 0.5 M NaCl, 20 mM imidazole, and 10 mM 2-mercaptoethanol (buffer A) and suspended in 10 ml of buffer A/g of cells. The cells were lysed with a French press at 20,000 p.s.i. with the addition of diisopropyl fluorophosphate (2.5 l/g) and further centrifugation (150,000 ϫ g for 1 h). The supernatant was loaded onto a Ni(II)-nitrilotriacetic acid column preequilibrated with buffer A at a flow rate of 0.5 ml/min. The column was washed with 150 ml of buffer A, followed by elution with 60 ml of buffer A containing 0.2 M imidazole. The eluted protein was concentrated with a 10 kDa cut-off Amicon Ultracentrifugal filter (Millipore Corp., Billerica, MA) and further purified by gel filtration using Superdex 75 (Amersham Biosciences) in a 45 ϫ 1.5-cm column with a total bed volume of 80 ml. The protein was eluted with a buffer consisting of 50 mM MOPS-KOH, pH 7.5, 0.5 M NaCl, 20% (v/v) glycerol, 0.2 mM EDTA, and 2 mM dithiothreitol at a flow rate of 0.3 ml/min. CgArsR1 was identified by SDS-PAGE (16). Fractions containing wild type and mutant proteins were concentrated by ultrafiltration centrifugation. Protein concentrations were estimated by the method of Bradford (17) or by determining the A 280 nm using a calculated extinction coefficient of 1615 cm Ϫ1 (18).
Cross-linking Assays-Dibromobimane (bBBr) (Invitrogen) cross-linking was performed as described (8). CgArsR1 was reduced in the presence of 2 mM tris(2-carboxyethyl) phosphine on ice for 30 min. Reductants were removed by dialysis prior to the assay. Purified CgArsR1 (30 M, except for C15S, which was 100 M) was incubated with a 3-10-fold molar excess of bBBr. Samples were analyzed by SDS-PAGE using 16% gels. The gels were visualized under UV 365 nm light and then stained with Coomassie Blue.
X-ray Absorption Spectroscopy (XAS)-Purified wild type and mutant CgArsR1 at 4 mM concentration were incubated on ice for 90 min with 2-3 mM sodium arsenite in a buffer consisting of 50 mM MOPS-KOH, pH 7.5, and 0.5 M NaCl. Glycerol, 30% (v/v) final concentration, was added, and arsenic-loaded protein was injected into Lucite cells covered with Kapton tape. Sample cells were flash-frozen in liquid nitrogen. XAS data were collected at the Stanford Synchrotron Radiation Laboratory on beamline 9-3 in a manner previously described (12). Data reduction and analysis was performed using the EXAF-SPAK software suite (available on the World Wide Web). Prior to data reduction and averaging, each fluorescence channel of each scan was examined for spectral anomalies. Published XAS spectra represent the average of five scans. Data reduction involved fitting the pre-edge spectral region with a second order polynomial and the EXAFS region with a cubic spline constructed of three regions. Following spline subtraction, EXAFS data were converted into k space using an E 0 at 11,885 eV. The k 3 weighted EXAFS was truncated at 13.0 Å Ϫ1 (due to monochromator glitches at higher k), Fourier-transformed, and simulated. The final published fitting results are from analysis of the raw unfiltered data. Data were fit using single scat-tering amplitude and phase functions generated using the Feff 7.2 for theoretical model generation (19). Single scattering Feff models were calculated for oxygen, sulfur, and arsenic coordination to simulate arsenic-nearest neighbor ligand scattering. Values for the scale factor (Sc) and E 0 , obtained previously from fitting crystallographically characterized arsenic models compounds, were equal to 0.98 and Ϫ10, respectively (20). During the fitting analysis, the scale factor and E 0 were fixed. The coordination number was fixed but manually varied at half-integer values. Values for the bond length and Debye-Waller factor were allowed to vary freely. The lowest mean square deviation between data and fit, corrected for the number of degrees of freedom (FЈ), was used to judge best fit EXAFS simulations and to justify the addition of supplemental interactions within the simulation.
DNA Footprinting-DNA footprinting was performed as described (12). Complementary DNA primers in which one was labeled at the 5Ј-end with WellRED D3 fluorescence dye (Integrated DNA Technologies, Coralville, IA), as indicated, were used to amplify the C. glutamicum ars1 o/p) region. Two pairs of primers were used, one for determining the arsRB direction, and the other for the arsBR direction. After optimization, a high molar excess of CgArsR1 was added at room temperature for 30 min. As a negative control, 5 M bovine serum albumin was used in place of CgArsR. The samples were loaded into 96-well plates and assayed with a CEQ 2000XL DNA sequencer (Beckman Coulter, Fullerton, CA). The data were analyzed with CEQ 2000 fragment analysis software.
DNA Binding by Fluorescence Anisotropy-DNA binding studies by fluorescence anisotropy were performed as described (12) using a Photon Technology International spectrofluorimeter fitted with polarizers in the L format. Complementary 30-mer oligonucleotides, one of which was labeled at the 5Ј-end with carboxyfluorescein, were synthesized (Integrated DNA Technologies, Inc.) based on the DNA footprinting results. Wild type or mutant CgArsR1s were titrated with 0.4 ml of 30 nM fluorescein-labeled ars1 o/p DNA. Changes in anisotropy were calculated after each addition using the supplied Felix32 software. To measure DNA dissociation of CgArsR1 upon binding to As(III)/Sb(III), 30 nM carboxyfluorescein-labeled DNA was saturated with 10 M wild type or mutant CgArsR1. CgArsR1 was treated with methyl methanethiosulfonate to modify the cysteine thiolates to -S-S-CH 3 . Dissociation of CgArsR1 was induced by the addition of varying amounts of As(III) or Sb(III).

CgArsR1 and CgArsR2 Have Atypical Metalloid Binding
Sites-A structure-based multiple alignment of CgArsR1 and CgArsR2 with other ArsR homologues and CadC was performed using the structure of CadC (9) (Fig. 1A). CgArsR1 and CgArsR2 are 66% identical to each other and exhibit high similarity to other members of the ArsR/SmtB family. Notably, both CgArsR1 and CgArsR2 lack the typical Cys 32 -Cys 34 -Cys 37 three-coordinate As(III) binding site of the well characterized R773 ArsR. They also lack the Cys 95 -Cys 96 -Cys 105 As(III) binding site of AfArsR (12). CadC binds Cd(II) in a four-coordinate binding site composed of four cysteine residues. CadC has a second, nonregulatory four-coordinate Zn(II) binding site composed of residues Asp 101 and His 103 from one monomer and His 114Ј and Glu 117Ј from the other monomer. The two CgArsRs lack the acidic and histidine residues of those sites as well. How do these ArsRs respond to environmental As(III), and where are the metalloid binding sites? Although they lack the binding sites identified in homologous metalloregulators, they have three cysteine residues, Cys 15 , Cys 16 , and Cys 55 that, from the alignment, do not correspond to metal(loid) ligands of homologues. From the structure-based alignment with CadC, a homology model of CgArsR1 was constructed (Fig. 1B). Like CadC, CgArsR1 has a putative helix-turn-helix DNA binding domain consisting of ␣4 and ␣5. In CadC, the Cd(II) binding domain is composed of Cys 58 and Cys 60 in ␣4 of one subunit and Cys 7 and Cys 11 from the N-terminal region of the other subunit. Although CgArsR1 has no corresponding cysteine residues, Cys 55 is proposed to be located just before ␣4, and Cys 15 and Cys 16 are in ␣1. Either Cys 15 or Cys 16 can be positioned close enough to Cys 55 to form a two-coordinate complex with As(III). Since the side chains of Cys 15 or Cys 16 would face in opposite directions, the ␣1 helix would have to be unraveled from the N-terminal end to allow them to form a three-coordinate complex with Cys 55 (Fig. 1C). Our working hypothesis, therefore, is that these three cysteine residues form a threecoordinate As(III) binding site, where Cys 15 and Cys 16 come from one subunit and the Cys 55 comes from the other. The experiments below were designed to test that hypothesis.
The As(III) Binding Site in CgArsR1 Is Composed of Three Sulfur Thiolates-Binding of As(III) and the geometry of the binding site was analyzed by XAS. Near edge x-ray absorption fine structure analysis indicates that arsenic in purified wild type CgArsR1 is stably bound as As(III) (data not shown). The observed value for the first inflection point energy is 11,866 eV, consistent with the observed values for As(III) binding sites (20). Simulation analysis of the EXAFS data for CgArsR1 was best fit with a single nearest neighbor ligand environment constructed with only three sulfur atoms (Fig. 2, A and B). The average bond length of arsenic-sulfur was calculated to be 2.25 Å. Attempts to simulate these data with a mixed sulfur and oxygen nearest neighbor environment were unsuccessful. These results are consistent with an As(III) binding site composed of the sulfur thiolates of Cys 15 , Cys 16 , and Cys 55 (Fig. 2C).
Contribution of CgArsR1 Cysteine Residues to Metalloregulation in Vivo-To examine whether Cys 15 , Cys 16 , and Cys 55 are involved in As(III) sensing in vivo, the double ars operon-deleted 2⌬ars was constructed. In an additional step, wild type and mutant copies of the arsR1 gene were integrated into the chromosome of C. glutamicum 2⌬ars at the same position of the original ars1 operon, creating strains encoding wild type ArsR1 (2⌬arswt) and variants C15S, C16S, C55S, C15S/C16S, C15S/C55S, C16S/C55S, or C15S/C16S/C55S. These strains were used as host for the bifunctional mobilizable plasmid pECGFPars, which contains the original ars1 o/p regulatory region fused in the arsB direction to the reporter gene egfp2 (a variant of gfp) that has been commonly used for expression analysis in corynebacteria.
Cells of C. glutamicum lacking the two ars operons had constitutive GFP expression (Fig. 3). Cells with wild type arsR1 expressed egfp2 at repressed levels in the absence of metalloid, and the addition of either As(III) or Sb(III) induced expression. As(III) induced well at 10 M. Sb(III) is not taken up very well by C. glutamicum and is therefore not as effective as an inducer as As(III). Even so, Sb(III) induced more at 30 M than at 10 M.
All of the C. glutamicum strains containing the single (C15S, C16S, and C55S), double (C15S/C16S, C15S/C55S, and C16S/ C55S), or triple (C15S/C16S/C55S) arsR1 mutants repressed the egfp2 expression, demonstrating that the mutant ArsR1s were capable of binding to the ars1 o/p. In contrast, no induc-tion was observed with any of the mutants, suggesting that they do not dissociate from the DNA in the presence of As(III) (Fig.  3A) or Sb(III) (Fig. 3B). These results are consistent with the hypothesis that each of the three cysteine residues contributes to the inducer binding site.
DNA Binding Properties of CgArsR1 and CgArsR2 and Contribution of Cysteine Residues to Metalloregulation in Vitro-To examine the DNA binding properties of CgArsRs and the contribution of cysteine residues to metalloid-induced dissociation from the DNA, protein-DNA interactions were analyzed in several ways. First, binding was examined by DNA mobility shift assays. In the two C. glutamicum ars operons, the gene organization is different from the R773 arsRDABC in that the corresponding arsRs genes are expressed divergently from the other ars genes. The ability of CgArsR1 and CgArsR2 to bind to their . The structure-based multiple alignment was calculated using 3DCoffee (Pubmed ID 15201059). B, modeling of As(III)-free CgArsR was performed with SWISS-MODEL (Pubmed ID 12824332) (26), using the crystal structure of CadC (9) as template. The two monomers are colored in purple (subnunit a) and green (subunit b), respectively. The cysteines are shown in ball-and-stick models. The DNA binding domain is ␣4-turn-␣5. The model was drawn using PyMOL (27). The side chain sulfur atoms of Cys 15 , Cys 16 , and Cys 55 are shown as yellow spheres. C, binding is proposed to occur in two steps: 1) As(III) (blue sphere) binds first to the thiolates of Cys 55 of one subunit and either Cys 15 or Cys 16 (shown here for only Cys 15 ) in ␣1 of the other subunit, forming a low affinity S 2 O site, and 2) the end of ␣1 unravels to allow the adjacent cysteine residue to become the third ligand to As(III), forming a high affinity S 3 site. The model of the As(III)-bound form was built by manually adjusting the N termini so that the geometry of As(III) coordination derived from the EXAFS results.
own ars intergenic regulatory DNA (in cis) and also their ability to bind to the other ars intergenic DNA (in trans) were determined. In both cases, bands corresponding to the amplified intergenic regions from either ars1 (igR1B1) (Fig. 4) or ars2 (igR2B2) (data not shown), respectively, were retarded by either CgArsR1 or CgArsR2. These probes were 150 bp (igR1B1) and 180 bp (igR2B2) and contained a portion of the arsR gene as well as the sequence past Ϫ70. As shown below, this probe has two sequences that footprint with CgArsR1, but only the upstream sequence binds with low affinity and should not affect DNA mobility in this assay. Instead, the slight differences in mobility can be attributed to increasing concentrations of repressor. Either repressor bound to either promoter site, despite the fact that they have different palindromic sequences in their intergenic regions. Since both repressors appeared to have similar binding abilities, subsequent biochemical analysis was performed with only CgArsR1.
Second, to examine DNA binding properties in more detail, the sequence in the intergenic region between the arsR1 and arsB1 genes to which CgArsR1 binds was determined by DNase I footprinting (Fig. 5). CgArsR1 were observed to bind to two regions of 30 base pairs, from Ϫ7 to Ϫ37 bp and Ϫ47 to Ϫ77, each numbered relative to the start of arsB.
DNA binding was further analyzed by fluorescence anisotropy (Fig. 6) Double-stranded oligonucleotides of 30 bp, each corresponding to the Ϫ7 to Ϫ37 and Ϫ47 to Ϫ77 sequences, were labeled with carboxyfluorescein for anisotropy measurements. The free probes tumble rapidly in solution and hence have relatively low optical anisotropy. Binding of CgArsR1 and hence the emitted light is partially polarized. Dissociation of the repressor from the DNA reverses the increase in polarization. CgArsR1 bound strongly to the Ϫ7 to Ϫ37 probe (Fig. 6A). Despite the footprinting results, CgArsR1 bound only weakly to the Ϫ47 to Ϫ77 probe. It is possible that binding of the repressor to the Ϫ7 to Ϫ37 region cooperatively increases binding to the Ϫ47 to Ϫ77 region. Wild type CgArsR1 bound to the Ϫ7 to Ϫ37 probe with an apparent affinity of 0.15 M. Each single, double, and triple cysteine mutant bound to the Ϫ7 to Ϫ37 probe, although with reduced affinity ranging from 0.5 to 2 M. When the cysteine residues in wild type CgArsR1 were chemically modified with methyl methanethiosulfonate, the affinity of the modified repressor was similar to that of the cysteine mutants. These results demonstrate that none of the cysteine residues is required for binding to ars1 o/p DNA.
The effect of binding of As(III) (Fig. 6B) or Sb(III) (Fig. 6C) on dissociation of the repressor from the ars1 o/p DNA was assayed by fluorescent anisotropy. Wild type CgArsR1 exhibited half-maximal dissociation at 0.15 mM As(III). The concentration of Sb(III) required for half-maximal dissociation was ϳ10 M. For unexplained reasons, the addition of reduced glutathione increased the apparent affinity for As(III) but not Sb(III). In contrast to the in vivo results (Fig. 3), Sb(III) was more effective in vitro than As(III). This may represent relatively poor uptake of Sb(III) in vivo compared with As(III). However, none of the single, double, or triple mutants dissociated from the DNA with concentration of As(III) or Sb(III) as high as 1 mM. These data indicate that each of the three cysteine residues is necessary for dissociation from the DNA and hence derepression.
Cys 15 and Cys 16 from One Subunit Are in Proximity of Cys 55 in the Other Subunit-In CadC, the Cd(II) binding site is composed of two cysteine residues from the N-terminal region of one subunit and two cysteine residues from ␣4 of the DNA binding domain of the other subunit of the homodimer (9). In the homology model of CgArsR1 built on the CadC structure, the three-coordinate As(III) binding site is proposed to consist of Cys 15 and Cys 16 from one subunit and Cys 55 from the other (Fig. 1C). A testable prediction of that model is that the thiolates of either Cys 15 or Cys 16 should be close enough to the thiolate of Cys 55 to cross-link with a bifunctional thiol reagent, such as bBBr, which forms a fluorescent adduct following reaction with two thiolates that are within 3-6 Å of each other (22). If so, then the homodimer would not dissociate upon SDS-PAGE following bBBr treatment. Purified wild type CgArsR1 migrated as a monomer on SDS-PAGE (Fig. 7A, lane 2). Following treatment with bBBr, a portion of the protein migrated as a fluorescent dimer (Fig. 7, A and B, lanes 1). The band corresponding to the monomer was also fluorescent, which could result from bimane adduct formation between Cys 15 and Cys 16 from the same monomer or from formation of bimane adducts between cys-teine thiolates and solvent molecules. The single C15S and C16S mutants similarly formed fluorescent dimers (Fig. 7B, lanes 3 and 5). The triple C15S/C16S/C55S mutant did not dimerize or become fluorescent (data not shown). These results indicate that both Cys 15 and Cys 16 from one subunit are within 3-6 Å of the thiolate of Cys 55 in the other subunit (but not necessarily simultaneously, as suggested in Fig. 1C). These results are consistent with the hypothesis that the CgArsR1 As(III) binding site is composed of two cysteine residues from one subunit and one cysteine residue from the other.

DISCUSSION
The ArsR/SmtB family of metalloregulatory proteins is large and diverse, with more than 3000 members identified in archaea, prokaryotic genomes, and plasmids. They respond to a number of transition metals, heavy metals, and metalloids (1-3). Only a few have been examined at the structural level (9,23,24), but a striking observation is that, although they all appear to have evolved from a common winged helix repressor, their metal-(loid) binding appears to have evolved independently from each other and occurs in different spatial locations in their respective structures (3,12,25) (Fig. 8). These inducer binding sites can be located either near the DNA binding domain, such as found in R773 ArsR or CadC, or near the C-terminal dimer interface, as in SmtB, CmtR, or AfArsR. For softer metals and metalloids, such as As(III), Sb(III), and Cd(II), the binding sites are composed of three or four cysteine residues for As(III) or Cd(II). For harder metals, such as the Zn(II) binding site of SmtB and CadC, carboxylate oxygens and imidazole nitrogens form the sites. Moreover, the binding sites are in some repressors composed of residues within a single subunit (R773 ArsR or AfArsR) or sometimes between subunits of the homodimer (CadC, SmtB, or CmtR). Finally, the metal(loid) binding ligands are frequently located at the beginning or end of helices, where unraveling these helices may induce dissociation of the repressor from the operator/ promoter DNA. Ends are preferred, because it would undoubtedly take more energy to break the helix if they were located in the middle.
However, that reasoning is based on only a small number of examples from the thousands of homologous sequences in data bases. For that reason, we characterized two other members of the ArsR/SmtB family encoded by the two ars operons of C. glutamicum (13). As(III) induces gene expression in cells of C. glutamicum, justifying their designations as As(III)-responsive repressors CgArsR1 and CgArsR2. It was clear from the structure-based alignment that they lack the As(III) binding sites of the other two known As(III)-responsive repressors from plasmid R773, which has an S 3 site in the DNA binding site, and A. ferrooxidans, which has an S 3 site in the C-terminal dimerization domain.
From the near edge x-ray absorption fine structure results, CgArsR1 also binds As(III), and the EXAFS shows that it is an S 3 site. Since CgArsR1 has only three cysteine residues, Cys 15 , Cys 16 , and Cys 55 , these are the logical candidates for the S 3 site. Consistent with this idea, mutations in any of the three resulted in loss of As(III) responsiveness in vivo and loss of dissociation from the ars operator/promoter DNA in vitro. The CgArsR1 model suggests that the S 3 site is formed intermolecularly  between Cys 15 and Cys 16 in ␣1 and Cys 55 , which is located just before ␣4 of the helix-turn-helix DNA binding domain. This is reminiscent of the S 4 Cd(II) binding site of CadC, but none of the cysteine residues in CgArsR1 align with those in CadC. Thus, it is highly likely that the As(III) binding site in CgArsR1 evolved independently of the As(III) binding sites in R773 ArsR or AfArsR or the Cd(II) binding sites of CadC or CmtR.
As discussed above, we have noted that the metal(loid) binding site in each of these repressors has ligands that are located near the start or end of an ␣ helix and have suggested that unraveling of a helix from one end is involved in the conforma-   on a surface model of the CadC aporepressor structure either by coloring CadC residues corresponding to each binding site as identified from the structure-based alignment (Fig. 1A) or, in the case of CmtR, by overlaying the two structures. The S 3 As(III) binding site of the R773 ArsR (red) formed within each monomer overlaps with the corresponding S 4 Cd(II) binding site of CadC (yellow) formed between the N terminus of one subunit and the DNA binding domain of the other subunit. Zn(II) binding sites of CadC and SmtB (cyan) formed between the antiparallel C-terminal ␣6 helices also overlap. The S 3 binding sites of CgArsR1 (green), CmtR (blue), and AfArsR (purple) are at a variety of locations distributed over the surface of the repressor. tional change that dissociates the repressor from its cognate DNA sequence. We propose that this is the mechanism of derepression in CgArsR1. Although recognizing that a homology model is not a structure, it predicts that Cys 15 and Cys 16 are close enough to Cys 55 to simultaneously coordinate to As(III), and the bBBr cross-linking results support that idea. How do they form a three-coordinate binding site? Cys 15 and Cys 16 are a vicinal cysteine pair, which should form one of the strongest types of As(III) binding sites. However, they are predicted to be at the beginning of ␣1 (Fig. 1C), and their thiolates cannot approach each other closely enough to bind As(III) two-coordinately. We propose that Cys 55 and either Cys 15 or Cys 16 first bind As(III) in a weak complex. This is followed by unraveling the beginning of ␣1, exposing the third thiolate, which then completes the high affinity As(III) binding site and dissociation of CgArsR1 from the DNA.
In summary, the identification of a third As(III)-responsive repressor in which the inducer binding site is different from that of R773 ArsR, AfArsR, CadC, SmtB, or CmtR supports our hypothesis that the binding sites are the result of independent and recent evolutionary events. Just as animals have a body plan of bilateral symmetry, these repressors are built using a preexisting scaffold of an ancestral winged helix DNA binding protein. Just as animals such as birds, bats, flying squirrels, and flying fish evolved wings convergently, so too have the inducer binding sites of ArsR/SmtB homologues evolved in diverse locations on the surface of the protein in response to environmental pressures by spatial positioning of residues to form three-or four-coordinate metal(loid) binding sites (Fig. 8).