The Structure of CodY, a GTP- and Isoleucine-responsive Regulator of Stationary Phase and Virulence in Gram-positive Bacteria*

CodY is a global regulator of transcription in Gram-positive bacteria. It represses during growth genes required for adaptation to nutrient limitation, including virulence genes in some human pathogens. CodY activity is regulated by GTP and branched chain amino acids, metabolites whose intracellular concentrations drop as cells enter stationary phase. Although CodY has a highly conserved sequence, it has no significant similarity to proteins of known structure. Here we report crystal structures of two fragments of CodY from Bacillus subtilis that clearly constitute its cofactor and DNA binding domains and reveal that CodY is a chimera of previously observed folding units. The N-terminal cofactor-binding fragment adopts a fold reminiscent of the GAF domains found in cyclic nucleotide phosphodiesterases and adenylate cyclases. It is a dimer stabilized by an intermolecular six α-helical bundle that buries an extensive apolar surface rich in residues invariant in CodY orthologues. The branched chain amino acid ligands reside in hydrophobic pockets of each monomer distal to the dimer-forming surface. The structure of the C-terminal DNA binding domain belongs to the winged helix-turn-helix family. The implications of the structure for DNA binding by CodY and its control by cofactor binding are discussed.

The chromosomes of bacteria are packed with genes that enable the organism to exploit diverse ecological niches and to survive adversity. The key to adaptation and survival is to interpret indicators of a changing environment and respond by modulating the repertoire of genes being expressed. A fundamental challenge to survival is posed by the depletion of nutrients and the onset of starvation. Bacillus subtilis has at its disposal an array of adaptations to poor growth conditions, including secretion of macromolecule-degrading enzymes to scavenge what nutrients are available, synthesis and secretion of antibiotics that allow more effective competition for these nutrients, motility to seek new sources of nutrients, development of competence to take up exogenous DNA (which may confer a genetic advantage), or, in extreme circumstances, abandoning growth altogether and forming a dormant spore.
CodY was first discovered as a repressor of the dipeptide permease operon in B. subtilis (1). Subsequently it has been recognized as having a much wider role in controlling the expression of stationary phase genes. It is now known to regulate over 100 genes distributed across some 70 or so operons (2). The CodY regulon encodes extracellular degradative enzymes, transporter proteins, catabolic enzymes, factors involved in genetic competence, antibiotic synthesis pathways, chemotaxis proteins, and sporulation proteins. During rapid growth, CodY thus represses genes whose products allow adaptation to nutrient depletion. The repressor function of CodY is activated by two different effectors, GTP and isoleucine or valine, which may be viewed as sensors of the energetic and metabolic status of the cell, respectively (3)(4)(5). These co-repressors act independently and additively to increase the affinity of CodY for its target sites on DNA. CodY binds its ligands selectively but with moderate affinities (in the millimolar range), consistent with its need to monitor the concentrations of GTP and isoleucine/valine obtaining in rapidly growing cells.
CodY is highly conserved in the low G ϩ C Gram-positive bacteria. In Lactococcus lactis, CodY regulates the expression of genes encoding extracellular peptidases, peptide transport proteins, and intracellular enzymes involved in peptide and amino acid utilization (4). In pathogenic bacteria such as Streptococcus pyogenes, Listeria monocytogenes, Enterococcus faecalis, Bacillus anthracis, and Clostridium difficile, accumulating evidence points to a role for CodY in the regulation of virulence gene expression (6).
CodY from B. subtilis is a dimer of 29-kDa subunits. Its 259-amino acid residue sequence is unrelated to that of any proteins other than CodY orthologues. A putative helix-turn-helix (HTH) 3 motif has been identified close to the C terminus of the protein and specific residues in this motif are required for high affinity DNA binding (7,8). The only other clue to structure provided by the sequence is a set of possible GTP binding motifs again located in the C-terminal half of the molecule (3). To investigate the structural basis of GTP-and branched chain amino acid (BCAA)-dependent regulation of stationary phase gene expression, we embarked on crystallographic studies of CodY. Although we have been able to crystallize the intact protein (9), the crystals were weakly diffracting and unsuitable for structure determination. We therefore sought to identify stable proteolytic fragments of CodY as alternative targets for structure determination. Here we report the crystal structures of N-and C-terminal fragments constituting residues 1-155 and 168 -259, respectively.

EXPERIMENTAL PROCEDURES
Protein Preparation and Crystallization-CodY was purified as a C-terminal histidine-tagged protein (9) and partially digested with a series of proteases. Mass spectrometry of the digestion products led us to conclude that a region containing a cluster of 12 highly charged residues (156 -167) was particularly sensitive to proteolysis. Using ligation-independent cloning methods, we prepared pET28a derivative constructs encoding CodY fragments spanning residues 1-155 and 168 -259, each with an N-terminal His 6 tag. The recombinant CodY fragments were overexpressed in Escherichia coli BL21 and purified by (i) Ni 2ϩ -chelation chromatography on a 5-ml high-performance chelating-Sepharose column (Amersham Biosciences) charged with nickel ions and (ii) gel filtration chromatography. For CodY-(1-155) a HiLoad Superdex 200 preparation grade column equilibrated in 50 mM Tris, pH  (10). Crystals of CodY-(1-155) were grown in hanging drops containing a 1:1 volume ratio of 20 mg ml Ϫ1 protein in gel filtration buffer and 18% monomethyl ethyl polyethylene glycol 5000 in 0.1 M Tris-HCl, pH 7.0, 0.2 M calcium acetate, 10 mM Gpp(NH)p, and either 10 mM isoleucine (native protein) or 10 mM valine (SeMet protein). Crystals of CodY-(168 -259) were grown similarly from 1:1 mixtures of 20 mg ml Ϫ1 protein in 100 mM sodium citrate, pH 5.6, 0.2 M NaCl and 32% saturated ammonium sulfate in 0.1 M bis-Tris, pH 6.5, 5% glycerol, and 5% Tacsimate (Hampton Research).
Partial Proteolysis of CodY-CodY-His 6 protein (84 g, purified as described above) or bovine serum albumin, Fraction V (BSA; 84 g; Serologicals Products, Inc.), were preincubated at 25°C for 10 min with or without BCAAs (10 mM each of isoleucine, leucine, and valine). We then added trypsin (0.26 g, Sigma) or ␣-chymotrypsin (0.52 g, Sigma) and incubated the samples at 25°C in 0.12 ml of a buffer containing 20 mM Tris-HCl, pH 8, 1 mM MgCl 2 , 1 mM dithiothreitol, and 0.5 mM EDTA (for trypsin) or 2 mM CaCl 2 (for ␣-chymotrypsin). After various times, 0.01-ml samples were removed, denatured, and subjected to SDS-PAGE. After electrophoresis, the proteins were electrotransferred to sheets of nitrocellulose. Bands that stained with Coomassie Blue were excised from the nitrocellulose and subjected to N-terminal sequencing by successive Edman degradation by the Tufts Protein and Nucleic Acid Core Facility.
Structure Determination-Native data taken from crystals of CodY-(1-155) to 1.7-Å resolution were collected on beamline 10.1 ( ϭ 0.9800 Å) at the SRS (Daresbury, UK), using a MAR Research CCD165 detector. Three-wavelength CodY-(1-155) SeMet data were collected to 2.3-Å resolution at beamline BM14 at the ESRF (Grenoble, France). Data were processed and scaled using the program HKL2000 (11). The crystals were assigned to space group P2 1 2 1 2 with one molecule per asymmetric unit. The structure was solved by MAD phasing using the program HKL2MAP (12). The model was built using the program ARP/ WARP (13) with manual refitting in the program COOT (14) and refined with REFMAC (15). The electron density maps were of excellent quality and all 155 CodY residues of the single molecule in the asymmetric unit of the crystal were defined together with an isoleucine ligand and 175 solvent molecules. The final free R factor is 21.2%, R work is 15.3%, and the root mean square deviations of bond lengths and bond angles from ideal geometry are small ( Table 1).
The crystals of CodY-(168 -259) were in space group P422. The crystals were weakly scattering and the x-ray diffraction data were limited to 2.8 -3.0 Å spacing. The structure was solved using three wavelength MAD data and the model was built and refined as described above. The model contains a complete A chain and all but residues 234 -235 and   (35). The secondary structure elements in B. subtilis CodY are indicated above the alignment. Invariant residues are highlighted with a dark background, whereas conserved residues are boxed. Residues whose side chains in CodY from B. subtilis form prominent interactions with the isoleucine ligand are indicated by asterisks below the alignment, whereas residues whose side chains form prominent dimer-forming interactions are indicated by filled triangles.  APRIL 21, 2006 • VOLUME 281 • NUMBER 16 257-259 in chains B and C. The temperature factors of chain C are significantly higher than for chains A and B. The final free R factor is 24.8%, R work is 20.0% and the geometry is satisfactory (Table 1). -(1-155) has a three-layered globular structure. At the base of the molecule, as shown in Fig. 1A, is a three-helix bundle. Two of the helices are at the N terminus of the chain, whereas the third is at the C terminus of the chain. Helices ␣2 and ␣5 pack against a central 5-stranded anti-parallel ␤-pleated sheet, with a strand order ␤2-␤1-␤5-␤4 -␤3 (Fig. 1, A and B). The top layer, which does not lend itself to such simple description, is formed by two extended loops that connect strands ␤2 and ␤3, and ␤3 and ␤4. The ␤2-␤3 segment contains two ␣-helices ␣3 and ␣4, whereas the ␤2-␤3 loop lacks any recognizable secondary structure. These loops form the walls of a cavity whose base is formed by the ␤-sheet itself.

Tertiary Structure of CodY-(1-155)-CodY
Amino Acid Binding-The electron density maps clearly define the presence of an isoleucine ligand consistent with the inclusion of this amino acid in the crystallization drops (Fig. 1D). No evidence for the binding of the GTP analogue, Gpp(NH)p was seen, however. The isoleucine is bound above the ␤-sheet in such a way that the ␤2-␤3 and ␤3-␤4 loops effectively clasp the ligand. The isobutyl group of the ligand projects downwards toward the sheet so that it is enclosed in a hydrophobic cavity circumscribed by the side chains of Met 62 , Met 65 , Phe 71 , Pro 72 , Tyr 75 , and Pro 99 (Fig. 1E) and main chain atoms from residues 97-99. Valine binds to CodY in a very similar manner to isoleucine, the only significant difference in binding is caused by the absence of a methylene group in valine relative to isoleucine, which may be associated with a small rotation of the phenyl ring of Phe 71 (Fig. 1F). The ␣-amino and ␣-carboxylate groups of the amino acid ligand form polar interactions with the protein and solvent. The former forms charge-dipole interactions with the main chain carbonyl groups of Thr 96 and Phe 98 and a further hydrogen bond to the surface water molecule WAT24. The carboxylate group forms a two-pronged ion-pairing interaction with the guanidinium group of Arg 61 and further polar interactions with the amide group of Val 100 and the water molecule WAT143.
The residues making up the isoleucine-binding pocket are strongly conserved in the Bacillus, Listeria, Enterococcus, Staphylococcus, Streptococcus, and Lactococcus spp., although surprisingly they are less well conserved in the Clostridium spp. (Fig. 2). Arg 61 , which binds the ligand carboxylate, is replaced by apolar side chains in CodY from C. difficile and Clostridium perfringens, whereas the apolar Met 65 , which contacts the apolar side chain of the ligand, is replaced by Glu or Lys in these species. This suggests differences in the mode of ligand binding.
Quaternary Structure-Analysis of the molecular packing suggests an obvious dimer consistent with observations made from gel filtration of CodY-(1-155) during purification, and with the knowledge that intact CodY is a dimer (9). 4 In this dimer, the subunits are related by a 2-fold crystallographic symmetry axis. Intermolecular contacts are mediated principally through the first and the last ␣-helices of the respective domains (Fig. 3). These helices come together to form a four-helix bundle that is extended to a six-helix bundle by the adjacent helices ␣2 from the respective subunits (Fig. 3). Each subunit contributes ϳ1050 Å 2 of its surface area to the contact interface, corresponding to 13% of the total accessible surface area and typical of a dimer interface (16).
Highly conserved and/or invariant residues among CodY orthologues cluster at the dimer interface. There are two intermolecular salt bridges formed between the pairs of Arg 8 and Glu 144 residues, which engage in two-pronged interactions with each other. These interactions would appear to be crucial for CodY structure and/or function as both residues are invariant in the CodY orthologues (Fig. 2). There is an additional pair of hydrogen bonding interactions between Gln 15 and Thr 148 , the former residue being strongly conserved whereas the latter is invariant.
In the dimer, the isoleucine ligands of the respective subunits are distal to the dimer interface and their C␣ atoms are separated by 57 Å (Fig. 3). The C-terminal Arg 155 residues that in the intact protein are connected to the DNA binding domains are close in space.
CodY Has a GAF Domain-Despite the absence of any substantial sequence homology, a number of structures in the Protein Data Bank contain domains that can be superimposed onto CodY-(1-155) to yield positional root mean squared deviations of 2.5-4.0 Å for 80 -120 matching C␣ atoms (17,18). The shared characteristic of these proteins is a GAF domain, named after its discovery, on the basis of sequence analysis, in cGMP-stimulated phosphodiesterases, adenylate cyclases and a bacterial transcription regulator FhlA (19). It is now recognized that GAF domains are present in numerous signaling and sensory proteins.
The structure of the GAF domain was first defined in a study of the yeast protein YKG9 (Protein Data Bank entry code 1f5m (18)) and subsequently other GAF domain structures in complex with ligands have been determined (Protein Data Bank entry codes 1mc0 and 1ykd (20,21)). The superposition of CodY and YKG9 shows clearly that CodY belongs to the GAF domain family (Fig. 4, A and B). The ␤-sheet and two of the ␣-helices that pack against it are superimposable in all GAF domain proteins, with family members distinguished by differences in the loops connecting ␤-strands on the face of the sheet distal to these helices. In the ligand-responsive members of the family (20, 21), these loops shape the binding pocket.
Structure of the C-terminal Domain-CodY-(168 -259) has a compact globular structure with an anti-parallel ␤-sheet of topology ␤6 -␤8 -␤7 and five ␣-helices arranged with respect to the strands in the order ␣6 -␣7-␤6 -␣8 -␣9 -␤7-␤8 -␣10 (Fig. 1C). Helices ␣8 and ␣9 correspond to the predicted helix-turn-helix spanning residues 203-226. The sequence in this region is exceptionally highly conserved in the CodY orthologous set (Fig. 2). The role of this HTH in DNA binding has been established by site-directed mutagenesis studies, which have shown that non-conservative substitutions of Ala 207 , Arg 214 , Ser 215 , and Val 218 dramatically lower the affinity of CodY for ilvB and dpp promoter DNAs (8). The structure presented here shows that the side chains of Arg 214 , Ser 215 , and Val 218 are situated at the beginning of the putative recognition helix with their side chains pointing outwards from the molecule, consistent with a role in DNA recognition. Ala 207 resides on the preceding helix at a position where a small side chain would be required if the HTH helices are to be arranged in the current manner.
Structural comparisons reveal that the DNA binding domain of CodY belongs to the winged HTH family of nucleic acid-binding 4 K. Matsuno and A. L. Sonenshein, unpublished data. proteins (17,22). These domains consist of three ␣-helices (␣7, ␣8, and ␣9 in CodY) packing against a three-stranded anti-parallel ␤-sheet. The so-called "wings," W1 and W2, are constituted by the loop connecting the second and third strands (␤7-␤8) and a looping segment following the third strand (␤8), respectively. In three dimensions, these segments flank the second helix of the HTH apparently like the wings of a butterfly.
BCAA-induced Conformational Change-Because crystals prepared in the absence of BCAAs did not diffract well, we sought an alternative way of testing whether interaction with BCAAs induces a conformational change in CodY. We observed that trypsin has a preferred cleavage site after Lys 169 . Cleavage at other lysine and arginine residues also occurs, albeit with much lower efficiency (Fig. 5A). In the presence of branched chain amino acids, the major cleavage site was unchanged but none of the minor sites was recognized by trypsin (Fig. 5B). Isolation of each of the bands in Fig. 5A allowed us to determine the N-terminal sequence of each fragment and thereby deduce the sites of cleavage. Based on this analysis, we determined that BCAAs had no effect on cleavage after Lys 169 , but almost completely inhibited cleavage at Lys 64 , Arg 69 , Arg 130 , and Arg 156 . As a control, we showed that the cleavage pattern of CodY was not affected by threonine or methionine (data not shown) and that cleavage of BSA was the same in the presence or absence of BCAAs (Fig.  5, C and D). Incubation of CodY with ␣-chymotrypsin yielded at least seven fragments (Fig. 5E). Of these, only cleavages after Tyr 51 , after Phe 80 , and near Tyr 181 occurred in the presence of BCAAs (Fig. 5F).
Cleavage at other sites, such as Tyr 95 , Tyr 145 , and Met 173 was completely inhibited by BCAAs. Again, cleavage of BSA was unaffected by BCAAs (Fig. 5, G and H). It is interesting to note that the presumed conformational change in CodY that accompanies binding of BCAAs can affect residues far removed in space from the co-repressor binding site.

DISCUSSION
DNA binding and transcriptional repression by CodY are activated by the binding of the effector molecule, be it GTP or a BCAA (3,5). The proteolytic fragmentation of CodY and the resulting structural studies presented here lead to the conclusion that CodY is a modular protein made up of an N-terminal cofactor binding domain and a C-terminal DNA binding domain, belonging to the GAF and winged HTH domain families, respectively. These domains have been characterized in other systems, but their combination within a single polypeptide chain has not been reported before.
The exact mechanism by which binding of co-repressors increases the affinity of CodY for its target sites remains mysterious (see below), but it seems clear that interaction with BCAAs alters the conformation of CodY enough to change the susceptibility of certain residues to proteolytic enzymes (Fig. 5). These residues include some that are outside the N-terminal domain, suggesting that binding of BCAAs induces a conformational change that is translated to the C-terminal, DNA-binding domain.
Implications for GTP Binding-The GAF domain complexes described here define the BCAA binding site in CodY leaving open the question of how GTP is bound. In the absence of a structure of a GTP complex, we have examined other GAF domain proteins and CodY sequence alignments for other clues to nucleotide binding. Two of the known GAF domain protein structures contain nucleotide ligands, cyclic guanosine monophosphate (cGMP) in mouse phosphodiesterase 2A (PDE-2A) and a pair of cyclic adenosine monophosphate (cAMP) ligands in the  tandem GAF domains of a cyanobacterial adenylyl cyclase (20,21). The topologies of the ligand-binding sites are very similar and the ligands and the surrounding protein backbones are closely superimposable.
Compared with isoleucine in CodY, the cGMP in PDE-A2 is more deeply embedded in the GAF domain (Fig. 4C). Following superposition, the guanine base lies close to a pair of buried water molecules that occupy the largest cavity in the CodY structure (Fig. 4D). These waters are flanked by the side chains of Phe 40 and Phe 98 , which are therefore well placed to make ring-stacking interactions with the guanine base of a putative GTP ligand. Phe 98 is invariant, whereas Phe 40 is conserved in all of the CodYs except those from L. lactis and the Streptococcus spp. Interestingly the CodYs from L. lactis and Streptococcus pneumoniae (and by inference the other Streptococcus spp.) do not respond to GTP (23,24). 5 The side chains of residues Gln 38 , Gln 55 , and Ser 129 provide hydrogen-bonding opportunities for the buried waters and potentially for the polar groups of a guanine base. Ser 129 is conserved as a small residue (Ser, Gly, or Ala) in the CodY orthologues in Fig. 2, with the exception of those from L. lactis and the Streptococcus spp., which may again relate to their failure to respond to GTP. It is tempting to suggest that the Trp replacements in these CodY proteins provide an indole group to fill the volume that would otherwise be available to the guanine base of GTP.
The cavity containing these waters is contiguous with the isoleucinebinding pocket although it tapers considerably before opening out into the latter. However, given that ligand binding is probably accompanied by closure of the ␤2-␤3 and ␤3-␤4 loops by a claw-like action, it is possible that these elements could adapt their conformation so as to embrace the GTP ligand. Thus, the nucleotide base may bind in this cavity with the ribose and triphosphate moieties extending toward the surface where Arg 61 may mediate important interactions with the ␥-phosphate, which is known to be a determinant of the action of GTP because GDP is not an effector of CodY (3). This speculative model for nucleotide binding predicts overlapping binding sites for GTP and isoleucine. The implied mutually exclusive binding of corepressors, however, is not easily reconcilable with the observation that GTP and isoleucine have independent and additive effects on DNA binding by CodY. Although we have thus far been unable to grow crystals of CodY-(1-155) in the absence of isoleucine/valine, the full-length protein has been crystallized both in the absence of effectors and in the presence of GTP as sole effector (9) leaving open the possibility that at least one of the GTP binding determinants, such as the putative G motifs (3) resides in the C-terminal domain.
Implications for Cofactor Control in CodY-In other transcriptional regulatory systems, ligand binding promotes relative domain motions and/or quaternary structural changes. In CodY, the isoleucine-binding site is not linked to any obvious hinge that could mediate long-range conformational change. As the isoleucines are bound in pockets on the faces of the ␤-sheets distal to the dimer interface, it seems unlikely that the effects of ligand binding would alter subunit interactions. Indeed there is no evidence for ligand-dependent quaternary structural changes in CodY.
The organization of the CodY-(1-155) subunits in the dimer places the C termini in close apposition, allowing for the possibility that the emerging DNA binding domains reside side by side. However, 12 linker residues between 155 and 168, a high proportion of which have ionizable side chains (Fig. 2), are not present in either of the structures and an appreciation of the juxtaposition of the domains in intact CodY will have to await the solution of the structure of the intact protein.
A system with possible structural and mechanistic analogy to CodY is TraR. TraR in Agrobacterium tumefaciens is a quorum sensing protein that responds to a homoserine lactone pheromone. The structure of TraR has been solved in complex with its cofactor and with a tra box target DNA (Fig. 6A) (25,26). The cofactor binding domain has a GAF domain fold. Like CodY, TraR is a dimer with the monomermonomer interface constituted by the helical regions of the subunits that are distal to the effector binding sites. The DNA binding domains emerge from corresponding locations, but instead of the winged HTH domain observed in CodY, the DNA binding domain in TraR belongs to the LuxR family and has a GerE-type fold (27).
The homoserine lactone ligand is believed to stabilize TraR against intracellular proteolysis (28). Consistent with this idea, it is even more deeply recessed in the hydrophobic cavity of the GAF domain (25,26) than the isoleucine in CodY (Fig. 4C). This suggests that CodY too may be susceptible to proteolysis as BCAA/GTP levels drop when the cells enter the stationary phase. Although BCAA binding alters the susceptibility to in vitro proteolysis (Fig. 5), in vivo measurements indicate that CodY protein is present at the same concentrations throughout growth and stationary phase (3).
Implications for DNA Binding-A winged HTH domain with similarity to that observed in CodY occurs in the fatty acid-responsive transcription factor FadR. The structure of FadR in its complex with FadR operator DNA is shown in Fig. 6B (29, 30). The winged HTH DNA binding domain is connected to a C-terminal seven-helix bundle domain containing a hydrophobic pocket for acyl-CoA binding. FadR binds to DNA as a dimer using its recognition helix to make contacts with the major groove of the DNA and one of its wings, W1, to penetrate the minor groove.
The winged HTH domains of FadR and CodY can be superposed on one another to give a positional root mean square deviation of 2.7 Å for the 55 matching C␣ atoms. This comparison shows that the invariant residues among CodY orthologues lie on the face of the molecule corresponding to the DNA binding surface in FadR. These are located in the segment L 179 SYSE 183 in the turn between the helices ␣6 and ␣7, residues 211-228, which span the recognition helix and the preceding turn, and residues 232-240, which constitute the wing, W1 (Fig. 2). In the latter instance residues Arg 233 and Lys 238 appear to be well placed for forming interactions with the backbone phosphates of the DNA.
It is interesting to note that there is no tertiary structural similarity between CodY and the leucine-responsive regulatory protein, which 5 R. P. Shivers and A. L. Sonenshein, unpublished data. performs in Gram-negative bacteria many of the functions attributed to CodY in B. subtilis (31,32). It would appear that amino acid and guanine nucleotide regulation of stationary phase gene expression has evolved independently in these two systems. Finally in the process of manuscript review, a referee drew our attention to a bioinformatics analysis where it is shown that the GAF plus winged HTH domain composition of CodY is predictable by bioinformatics analysis.