Structural Analyses of a Purine Biosynthetic Enzyme from Mycobacterium tuberculosis Reveal a Novel Bound Nucleotide*

Background: The purine biosynthetic enzyme ATIC is a recognized candidate for the development of therapeutic drugs. Results: Structure of mycobacterial ATIC was determined and a novel bound nucleotide characterized. Conclusion: ATIC structure shows basis for half-sites reactivity and evidence for tight binding of the novel nucleotide CFAIR. Significance: CFAIR could be used as the basis for new inhibitor design against ATIC. Enzymes of the de novo purine biosynthetic pathway have been identified as essential for the growth and survival of Mycobacterium tuberculosis and thus have potential for the development of anti-tuberculosis drugs. The final two steps of this pathway are carried out by the bifunctional enzyme 5-aminoimidazole-4-carboxamide ribonucleotide transformylase/inosine monophosphate cyclohydrolase (ATIC), also known as PurH. This enzyme has already been the target of anti-cancer drug development. We have determined the crystal structures of the M. tuberculosis ATIC (Rv0957) both with and without the substrate 5-aminoimidazole-4-carboxamide ribonucleotide, at resolutions of 2.5 and 2.2 Å, respectively. As for other ATIC enzymes, the protein is folded into two domains, the N-terminal domain (residues 1–212) containing the cyclohydrolase active site and the C-terminal domain (residues 222–523) containing the formyltransferase active site. An adventitiously bound nucleotide was found in the cyclohydrolase active site in both structures and was identified by NMR and mass spectral analysis as a novel 5-formyl derivative of an earlier intermediate in the biosynthetic pathway 4-carboxy-5-aminoimidazole ribonucleotide. This result and other studies suggest that this novel nucleotide is a cyclohydrolase inhibitor. The dimer formed by M. tuberculosis ATIC is different from those seen for human and avian ATICs, but it has a similar ∼50-Å separation of the two active sites of the bifunctional enzyme. Evidence in M. tuberculosis ATIC for reactivity of half-the-sites in the cyclohydrolase domains can be attributed to ligand-induced movements that propagate across the dimer interface and may be a common feature of ATIC enzymes.

Purine nucleotides are required by all organisms to provide the building blocks for DNA and RNA. Two types of pathways exist in most organisms for the production of these compounds as follows: the de novo biosynthetic pathway, in which nucleotides are synthesized from 5Ј-phosphoribosylpyrophosphate (PRPP) 4 in a multistep series of reactions (1); and salvage pathways, in which nucleotides are retrieved after the breakdown of nucleic acids or coenzymes (2,3). These two alternatives may be relatively more or less important in a given organism or in response to different cellular requirements. In humans, for example, the demands of rapidly dividing tumor cells require an abundant source of nucleotides, which is provided by de novo biosynthesis (4), whereas normal cellular growth may be largely maintained by salvage. Enzymes of the de novo pathway have therefore been targeted for anti-cancer drug development (5)(6)(7)(8)(9). Similar efforts are being made toward the development of new anti-infectives (10), and in Mycobacterium tuberculosis, the cause of tuberculosis, the de novo purine biosynthetic pathway appears to be critical for growth and survival, with 10 of 13 identified purine biosynthetic enzymes proving to be essential in a genome-wide transposon mutagenesis study (11).
The classical de novo purine biosynthetic pathway includes 10 enzymatic reactions that successively convert PRPP to 5Ј-inosine monophosphate (IMP) (1). Variations exist between organisms, however, with some single step reactions in eukaryotes being carried out as two separate steps in prokaryotes and with some multifunctional enzymes in eukaryotes being substituted by monofunctional enzymes in prokaryotes. In most microorganisms, 12 biosynthetic steps are required, 6 of which utilize ATP, 2 utilize glutamine, and 2 utilize N 10formyl tetrahydrofolate (THF). All of the substrates in the pathway share a common ribose 5-phosphate moiety on which the purine base is built sequentially (1). The final two steps are usually catalyzed by a single-chain bifunctional enzyme 5-aminoimidazole-4-carboxamide ribonucleotide transformylase/ IMP cyclohydrolase (ATIC), also known as PurH (12). The first step in the reaction catalyzed by ATIC involves the transfer of a formyl group from the cofactor N 10 -formyltetrahydrofolate to the substrate 5-aminoimidazole-4-carboxamide ribonucleotide (AICAR) to form the intermediate 5-formylaminoimidazole-4-carboxamide ribonucleotide (FAICAR), which is then cyclized with the loss of water to produce the final product of the pathway, IMP (Fig. 1A). In archaea, however, these two steps are carried out by quite distinct enzymes, unrelated to PurH, and without the use of tetrahydrofolate (THF) (13).
The eukaryotic ATIC enzymes have been extensively characterized through structural analyses of the chicken (14,15) and human (16) enzymes, coupled with kinetic, mutagenesis, and inhibition studies (17,18). These studies have established detailed catalytic mechanisms for both reactions (15,18). Both enzymes are dimeric, with two distinct functional domains per monomer. The THF-dependent formyl transfer reaction takes place in the C-terminal domain of the protein, residues 200 -593 in the chicken enzyme, whereas the product of this reaction, FAICAR, undergoes cyclization in the active site of the N-terminal domain, some 50 Å away (14). Intriguingly, there is no evidence of any tunnel connecting the two active sites. A feature of both the chicken and human ATIC structures was that a nucleotide was found bound in each case to just one of the two cyclohydrolase domains of the respective dimers. This was evidently carried over from expression in the Escherichia coli FIGURE 1. A, reaction scheme for ATIC, showing the two steps. B, overall architecture of the MtbATIC dimer, with the C-terminal formyltransferase domains above and the N-terminal cyclohydrolase domains below. The two monomers A and B are colored in light blue and gold, respectively. Potassium ions bound in each monomer are shown as spheres. The bound AICAR and CFAIR molecules are shown in magenta and green stick mode, respectively. The ␤-ribbon formed by the connecting peptides (green for molecule A and red for molecule B) is also shown. C, comparison of the dimers formed by MtbATIC (left, monomers A and B colored in gold and light gold, respectively) and human ATIC (right, monomers A and B colored in blue and sky blue, respectively). For comparison, the AICAR formyltransferase domains of the Mtb and human ATIC structures were superimposed, and the two molecular surface representations are shown in the same orientation. Figures were generated using PyMOL (42).
host used, remaining bound through all purification and crystallization steps. This nucleotide was identified as xanthosine monophosphate (XMP) on the basis of HPLC and mass spectrometry (18).
Here, we describe crystal structures of ATIC from M. tuberculosis (MtbATIC) at 2.2 Å resolution in the absence of bound substrate and at 2.48 Å resolution with the substrate AICAR bound. Interestingly, a nucleotide is again found in one of the two cyclohydrolase active sites of the dimer, evidently carried over from E. coli expression. This adds weight to the conclusion that ATIC exhibits half-of-the-sites reactivity. In contrast to the conclusions drawn for the human and chicken enzymes, however, we use NMR and mass spectrometry to identify the adventitiously bound nucleotide as a novel formylated nucleotide potentially derived from an intermediate formed in an earlier step of the purine biosynthetic pathway.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-The open reading frame encoding Rv0957 (ATIC) was amplified by nested PCR from the genomic DNA of M. tuberculosis H37Rv. Gateway cloning was used as described by Moreland et al. (19) to generate the expression vector pDEST17 harboring an N-terminal His 6 tag. The Rv0957 Gateway plasmid was transformed into E. coli BL21 (DE3) cells that were plated on Luria-Bertani agar plates containing 50 g/ml ampicillin. A 5-ml pre-culture of E. coli BL21 (DE3) cells was grown in noninducing MDG media overnight at 37°C, and a 1:200 dilution of this pre-culture was used to inoculate the expression culture. Expression of the N-terminally His 6 -tagged MtbATIC was carried out in the autoinduction medium ZYM-5052 (20). The culture was grown at 37°C until the A 600 reached 0.6 and was then transferred to 10°C for 2 days. The cells were harvested by centrifugation for 15 min (4000 ϫ g) at 4°C, resuspended in 50 mM NaH 2 PO 4 , pH 7.5, 1 mM EDTA, 14 mM ␤-mercaptoethanol, and lysed with a cell disruptor (Constant Systems Ltd.) at 18,000 p.s.i. Cellular debris was separated by centrifugation (14,000 ϫ g, 20 min, 4°C). His-tagged MtbATIC was purified by immobilized metal affinity chromatography using a 5-ml Ni 2ϩ -charged Hi-trap column (Amersham Biosciences). The ATIC-containing fractions were dialyzed overnight (4°C) against 50 mM NaH 2 PO 4 , pH 7.5, 1 mM EDTA, 14 mM ␤-mercaptoethanol in the presence of recombinant tobacco etch virus protease to cleave the His 6 tag. A second immobilized metal affinity chromatography column to remove any remaining His 6 -tagged material was followed by size exclusion chromatography using an S200 10/30 column (Amersham Biosciences). The fractions containing MtbATIC were concentrated to 20 mg/ml using 10,000 molecular weight cutoff concentrators (Vivaspin).
Crystallization and Data Collection-Initial crystallization conditions were found using a Cartesian nanoliter dispensing robot (HONEYBEE TM ) with a 480-component screen (19). These experiments were performed at 18°C by sitting-drop vapor diffusion, using 100-nl drops of protein (20 mg/ml) mixed with 100 nl of precipitant. The most promising condition (30% PEG 8000, 0.2 M sodium acetate, 0.1 M sodium cacodylate, pH 6.5) led to the formation of one small single crystal after 2 days. After further optimizations, crystals suitable for data collection were obtained in hanging drops consisting of 1 l of protein (20 mg/ml) and 3 l of reservoir (24 -34% PEG 8000, 0.2 M sodium acetate, 0.1 M sodium cacodylate, pH 6.5). For data collection, a native crystal was soaked for a few seconds in a solution composed of mother liquor and 20% glycerol and then flash-cooled. X-ray diffraction data to 2.2 Å resolution were measured at 100 K using CuK ␣ radiation from a rotating anode generator (Rigaku Micromax-007HF) equipped with a MAR 345 imaging plate detector. For the AICAR complex, a native crystal was soaked for 2 min in a solution containing 100 mM AICAR (Sigma), the reservoir solution, and 20% glycerol (cryoprotectant). X-ray data to 2.48 Å resolution were collected at 100 K at the Stanford Synchrotron Radiation Laboratory. Both data sets (see Table 1) were processed using HKL2000 (21).
Structure Determination and Refinement-The native MtbATIC three-dimensional structure was solved by molecular replacement using PHASER (22). A composite search model was used comprising residues 3-195 from the human ATIC crystal structure (Protein Data Bank code 1PKX) (16) and residues 167-452 from the Thermotoga maritima ATIC (TmATIC) crystal structure (Protein Data Bank code 1ZCZ) (23). The solution contained two molecules in the asymmetric unit. An atomic model was built and adjusted using COOT (24) and refined by restrained least squares with REFMAC 5.3 (25). BUSTER 2.8 (26) was used for the latter stages of refinement. No interpretable electron density was found for the N-terminal residues of either molecule; the final model consisted of residues 5-523 in molecule A and residues 7-523 in molecule B. Residues 65-67 could also not be modeled in molecule A due to lack of interpretable density. The two molecules are very similar with a root mean square difference of 1.2 Å for 517 C ␣ atoms. After a few cycles of refinement, clear difference electron density was visible for an unidentified nucleotide bound to molecule A and for a potassium ion in both molecules. The nucleotide was initially modeled as XMP but was later determined to be 4-carboxy-5-formylaminoimidazole ribonucleotide (CFAIR) and was modeled as such, using the Dundee PRO-DRG2 server (27). The final values for R work and R free were 19.9 and 23.7%, respectively. For the AICAR complex, the refined native MtbATIC structure was used as the initial model and refined to 2.48 Å resolution with REFMAC 5.3 and BUSTER 2.8 using the same refinement procedure as for the native structure. Difference electron density maps showed clear density for the substrate AICAR in both monomers. A CFAIR molecule as the E-amide conformer and two potassium ions were also modeled in the same locations as for the native structure. Residual electron density was also found in the active site of the cyclohydrolase domain of molecule B. This was suggestive of a nucleotide, possibly an additional AICAR molecule, but only the phosphate moiety could be modeled with certainty. The final values of R work and R free for the AICAR complex were 17.9 and 22.3%, respectively. Full details of the refinement statistics and the final model for each structure are in Table 2.
Mass Spectrometry of Nucleotide Ligand-The ligand bound to MtbATIC was directly analyzed using mass spectrometry with electrospray ionization (ESI) on quadrupole-time of flight (QTOF MS) and Fourier transform ion cyclotron resonance instruments. Purified MtbATIC was exchanged into 50 mM NH 4 HCO 3 and infused into the mass spectrometer ion source in 50% v/v acetonitrile, and the bound ligand was released insource. ESI-QTOF MS was carried out on a QSTAR XL hybrid mass spectrometer (Applied Biosystems) in both positive and negative ion mode. Species of interest were further analyzed by MS 2 fragmentation, and a major negative ion species of m/z 366 was identified as the nucleotide ligand. Precise mass measurements were then carried out on this parental species and associated MS n fragments using a LTQ Fourier transform ion cyclotron resonance mass spectrometer (Thermo Finnigan). Infusion into the ion source was at 3 l/min with a source voltage of 2.5 kV, a tube lens setting of Ϫ100 V, and a capillary temperature of 225°C. Fragmentation of the ligand was carried out to MS 5 with accurate mass determination on ions up to MS 4 .
Isolation of Nucleotide Ligand-The nucleotide ligand was isolated from MtbATIC by denaturation of the enzyme and purification of the released nucleotide by ion exchange chromatography. His 6 -tagged ATIC purified using immobilized metal affinity chromatography, as detailed above, was dialyzed into 20 mM NH 4 HCO 3 , pH 8. Solid urea was added to the dialyzed His 6 -tagged ATIC with stirring to give a final concentration of 6 M. In initial experiments, this preparation was also heated to 100°C for 10 min to ensure complete denaturation of the enzyme. However, this heating step was later found to be unnecessary, and a 15-min incubation at room temperature was sufficient for release of the nucleotide. The urea-containing MtbATIC solution was loaded onto a Mono Q HR 5/5 anion exchange column (GE Healthcare), and a gradient of 20 -400 mM NH 4 HCO 3 was run over 40 ml at 2 ml/min. Eluate absorbance was monitored at 220, 260, and 280 nm. One major peak was detected at a concentration of 130 -150 mM NH 4 HCO 3 . Analysis of this peak by ESI-QTOF MS by direct infusion confirmed that it consisted of the m/z 366 species detected by MS analysis of MtbATIC. Peak fractions were combined and dried by lyophilization or on a centrifugal vacuum concentrator, and the nucleotide was stored at Ϫ20°C.
NMR Characterization of Nucleotide Ligand-The purified nucleotide was dissolved in D 2 O for analysis by NMR spectroscopy. Preliminary one-dimensional 1 H NMR and temperaturedependent one-dimensional 1 H NMR experiments were acquired on a DRX 400 spectrometer (Bruker). Detailed analysis of the 1 H and 13 C chemical shifts was carried out using oneand two-dimensional spectra acquired on an AV600 spectrometer (Bruker) fitted with a cryoprobe and at 300 K. This included 1 H-1 H COSY, 1 H-13 C HSQC, and 1 H-13 C HMBC experiments. Two separate chemical species were observed in all NMR spectra at a 74:26 ratio at 300 K. Temperature-dependent 1 H NMR spectra were collected at 25, 40, 50, and 60°C to examine whether these two species were slowly interchanging isomers of the nucleotide. Chemical shifts are reported in parts/ million and were relative to an external TMS reference (␦ 0.0 ppm).
Computational Analysis of Nucleotide Ligand -Calculations on the predicted relative stability of the Z-and E-amide isomers of CFAIR were carried out using the Gaussian 09 software suite (28). For simplicity, the phosphate group was removed from the structures. Global energy minima were identified for both isomers by comparison of relaxed potential energy surface scans, at the RHF/3-21G* level, of the dihedral angles emanating from the imidazole ring. Gas phase calculations were performed at the RB3LYP/6 -31G(d)//RB3LYP/6 -31G(d) level. By comparison of the unscaled ZPE corrected electronic energies, the lowest energy Z-conformer was found to be 15 kJ/mol more stable than the lowest energy E-conformer. In repeating the conformer search and optimization using the polarizable continuum model solvent model (29) at the same level of theory, the Z-conformer was again found to be more stable than the E-conformer by 4.5 kJ/mol. Assuming that the lowest energy conformations would have the greatest influence on the Boltzman population ratio, the latter calculated energy difference corresponds to a population ratio of ϳ86:14 in favor of the Z-isomer of CFAIR.

RESULTS
Crystallographic Analyses-The MtbATIC enzyme was expressed in E. coli and crystallized in its presumed apo-form, without the addition of any added ligand. We refer to this as the apoenzyme, although one of the two molecules in the asymmetric unit was found to contain a ligand derived from the E. coli expression host. The characterization of this ligand is described later. The complex of MtbATIC with the substrate AICAR was prepared by soaking crystals of the apoenzyme with AICAR. Both structures were determined by x-ray crystallography and refined at resolutions of 2.2 and 2.48 Å resolution, respectively. The asymmetric unit of the crystal in each case contained two independent ATIC molecules, organized as tightly associated dimers. For both structures, the 523-residue polypeptide chain was fully defined except for a few residues at the N terminus (3-4 for molecule A and 6 for molecule B) and in a flexible loop of the cyclohydrolase domain, for which no interpretable density for residues 65-67 in molecule A of the apo structure could be found. Analysis with MOLPROBITY (30) showed that 99% of residues in each structure are in the most favored regions of the Ramachandran plot and that the apo structure and AICAR structures each have an overall MolProbity score in the 99th percentile.
Structural Organization of MtbATIC-As has been found for the human, chicken, and T. maritima enzymes, MtbATIC forms a tightly associated dimer (Fig. 2B) that is the presumed functional unit. Each monomer includes two distinct functional domains as follows: the N-terminal cyclohydrolase domain, which carries out the IMP cyclohydrolase reaction, and the C-terminal transformylase domain, which mediates the transfer of the formyl group from N 10 -formyltetrahydrofolate to the substrate AICAR. The two domains are joined by a linker peptide, residues 203-219, which forms a long ␤-ribbon that is critical to the mode of dimerization, described later.
The two major domains are very similar to their counterparts in the human, chicken, and T. maritima enzymes. The cyclohydrolase domain, residues 1-202, has a modified Rossmann fold topology, including a central 5-stranded ␤-sheet, with strand order 3-2-1-4-5, against which are packed three helices on each face, three more around the periphery, and a long final helix ␣10 that leads into the interdomain linker. The active site, in which the cyclization of FAICAR to the purine nucleotide IMP takes place, extends from the ␤2-␣2 loop across strands ␤1 and ␤4 to a site between helix ␣8 and the loop joining residues 71-74.
The C-terminal transformylase domain, residues 220 -523, contains two tandem subdomains, residues 242-367 and 388 -523, each of them with an ␣ ϩ ␤ fold comprising a mixed 6-stranded ␤-sheet and four ␣-helices arranged as in cytidine deaminase (31). Each of these subdomains is also preceded by an extended ␤-hairpin, residues 220 -242 and 368 -387. A potassium ion is present in this domain, bound to the main chain carbonyl oxygens of residues 421, 422, 424, and 520, and a carboxyl oxygen of Asp-470, with metal-ligand distances of 2.6 -2.8 Å. Adjacent to this is a cis-peptide between Ser-426 and Asn-427, which is conserved in all ATIC structures. Both residues may be functionally important; Asn-427 is part of the AICAR-binding site, and Ser-426 helps orient Asp-470, one of the potassium ion ligands. The formyl transfer active site lies at the interface between the two monomers of the dimer; there are thus two such active sites per dimer. Although only the binding of the AICAR substrate has been defined in the present analysis, binding studies on the chicken ATIC enzyme, using folate analogs and a bisubstrate inhibitor, indicate that both monomers contribute equally to each active site, one predominantly binding AICAR and the other predominantly binding folate. Dimerization must thus be essential for the formyl transfer reaction, as shown previously for the human enzyme (32).
The MtbATIC dimer is classified as a parallel, nontwisted dimer according to the web server class PPI (33). It matches closely with the dimer formed by TmATIC (23), but it is fundamentally different from those formed by the human and chicken ATIC enzymes (Fig. 1C). The difference is that if the C-terminal transformylase domains of the four proteins are superimposed, the cyclohydrolase domains of the two bacterial enzymes must be rotated about 90°about the long axis of the dimer to overlay on their eukaryotic counterparts; the latter are classified as twisted dimers. The origins of this difference lie in the structure formed by the connecting peptide, because there are otherwise no contacts between the N-and C-terminal domains.
In both MtbATIC and TmATIC, the connecting peptides of the two monomers form a long antiparallel ␤-ribbon that follows the last helix (␣10) of the cyclohydrolase domain and is oriented orthogonally to the long axis of the molecule (Fig. 1B). In MtbATIC, this ␤-ribbon includes residues 211-219 from each monomer, two residues more than in TmATIC, and has regular hydrogen bonds over its whole length. In contrast, a quite different substructure is found between the major domains in the two eukaryotic enzymes. In the MtbATIC dimer, the ␤-ribbon linker is followed immediately by the first of the two ␤-hairpins of the C-terminal domains and is intimately associated with it. Overall, the total buried surface within the dimer interface is about 5800 Å 2 , as calculated by the program PISA (34), representing about 23% of the average surface area of each monomer. Of 523 amino acid residues, about 150 contribute to this interface. Much of the buried surface is attributed to the two ␤-hairpins of each monomer, which inter-digitate like four fingers, but the buried surface extends for almost the whole length of the molecule.
Identification of the Adventitiously Bound Ligand-Clear electron density resembling that of a purine nucleotide was found in the active site of the cyclohydrolase domain of molecule A (but not molecule B) of the apo-MtbATIC structure. Because no ligand had been added, we assume that this is an adventitiously bound ligand derived from the E. coli expression host. A similar phenomenon was observed in the crystal structures of the apo-forms of the human and chicken ATIC enzymes (14 -16), in which it was modeled as XMP. This identification was not unequivocal, however, and we therefore sought to characterize the ligand bound to MtbATIC by mass spectrometry and NMR.
Initial characterization of the ligand was based on mass spectrometry of the purified MtbATIC directly. The ligand was released from the enzyme in-source during ESI-MS and detected as a negative [M Ϫ H] Ϫ ion of m/z 366.0. Fragmentation of this species by MS 2 resulted in major species of m/z 322, 211, 97, and 79 ( Fig. 2A). The three lower masses match those expected for phosphoribosyl, phosphate, and phosphite fragments, respectively, consistent with identification of the ligand as a nucleotide. However, the overall mass of the ligand does not correspond to any known nucleotide. This includes the AICAR substrate and IMP product of the MtbATIC reaction, which would give [M Ϫ H] Ϫ ions of m/z 337.0 and 347.0, respectively. The ligand mass is close to that for the reaction intermediate FAICAR but does not match its expected m/z of 365.0. It is also inconsistent with XMP (m/z 363.0) as modeled in the human and chicken ATIC structures.
Detailed structural analysis of the nucleotide ligand was carried out by accurate mass determination and further MS n fragmentation with Fourier transform ion cyclotron resonance MS (supplemental Fig. S1 and supplemental Table S1). An exact mass of m/z 366.034 was measured for this ion, corresponding to an elemental composition of C 10 H 13 O 10 N 3 P 1 Ϫ within 0.58 ppm (theoretical m/z 366.0344). This is close to the elemental composition of FAICAR (C 10 H 14 O 9 N 4 P 1 Ϫ ), except that the ligand contains one less nitrogen and hydrogen atom but has an additional oxygen, accounting for the 1 atomic mass unit difference in mass between the compounds. Additional fragmentation of an MS 2 species corresponding to the base moiety of the nucleotide (m/z 154.1) produced ions of m/z 126.0 and 110.1. The latter is consistent with a 5-formylaminoimidazole fragment as would be expected from a FAICAR-like structure. However, inconsistent with FAICAR, the base of the nucleotide undergoes decarboxylation producing a m/z 322.04 species from the parental nucleotide ion. This decarboxylated species was also observed in ESI-MS without MS n fragmentation, suggesting that the base of the nucleotide contains a carboxylate group that can be readily eliminated.
Combining the elemental composition, MS n fragment analysis, and the nature of the electron density at the cyclohydrolase active site of MtbATIC, it was speculated that the unknown nucleotide ligand may be a FAICAR analog in which the carboxyamide group is exchanged with a carboxylate (Fig. 2D). To our knowledge, this compound, CFAIR, is not a known metabolite and has not been previously prepared synthetically. To provide further evidence for this proposed structure, we sought to isolate the ligand from MtbATIC and analyze it by NMR spectroscopy. The nucleotide ligand was released from MtbATIC by adding 6 M urea to the enzyme and was purified using anion exchange chromatography (Fig. 2B). ESI-MS confirmed that the isolated ligand was unaltered from the nucleotide detected in the experiments described above.
A range of one-and two-dimensional 1 H and 13 C NMR experiments was carried out on the ligand in D 2 O (supplemental Figs. S2 and S3). Two separate species were observed in all NMR spectra at 25°C in a 74:26 molar ratio. Initially, we speculated that this may be due to decarboxylation of the nucleotide, as a related compound, and intermediate in purine biosynthesis, 4-carboxy-5-aminoimidazole (CAIR) readily undergoes nonenzymatic decarboxylation at pH Ͻ8 (35,36). However, unsuccessful attempts to separate the two species and the observation that their relative molar ratio was constant in different preparations of the nucleotide suggested that these corresponded to different isomers of the nucleotide. FAICAR and the related compound 5-formylaminoimidazole ribonucleoside (FAIRs) exist as two slowly interconverting Z and E rotational isomers around the formamide bond (37,38).
Consistent with the existence of Z-and E-rotational isomers of the nucleotide ligand, temperature-dependent 1 H NMR experiments from 25 to 60°C showed considerable peak broadening with heating and some signals from the two species began to coalesce at 60°C (supplemental Fig. S4). Computational analysis of the lowest energy conformations of the Z-and E-amide isomers of CFAIR using a polarizable continuum model solvent model (29) estimates that the Z-isomer is 4.5 kJ/mol more stable that the E-isomer. This correlates to a Boltzman population of 86:14 in favor of the Z-isomer of CFAIR, suggesting that the Z-isomer is the major species observed in solution by NMR spectroscopy. In agreement with this, the analogous nucleoside FAIRs was previously shown to exist in a 78:22 ratio of Z-and E-isomers in D 2 O (38).
Complete assignment of 1 H NMR chemical shifts was possible for the major isomer of the nucleotide (Fig. 2C). Two of the minor isomer 1 H peaks were obscured by overlap with those of the major isomer. H2 of the imidazole ring was observed to undergo slow exchange with deuterium from the solvent. Through a combination of 1 H-13 C HSQC and 1 H-13 C HMBC experiments, a nearly complete assignment was also obtained for 13 C NMR resonances of the major isomer. However, there was not sufficient signal to unequivocally assign the 13 C peak for the significant C6 carbonyl. Overall, the 1 H and 13 C chemical shifts for the nucleotide (Tables 3 and 4) are similar to those previously reported for FAICAR (37). The 13 C resonances for C2 and the formamide C7 of 134.1 (major), 164.8 (major), and 168.2 (minor) ppm, respectively, are particularly close to the 134.9, 164.5, and 167.6 ppm for FAICAR. Notably, there is no equivalent 13 C resonance evident in the nucleotide spectra at the 165.4 ppm reported for the amide C6 of FAICAR, with the remaining unassigned peaks in this region occurring at 162.8 and 168.7 ppm.
Ligand Binding in the Cyclohydrolase Active Site-In the substrate-free structure, a bound nucleotide is present in the active site of molecule A, whereas the molecule B active site is not occupied by any ligand. The nucleotide was modeled as CFAIR, in the E-isomer conformation, given the strong mass spectral and NMR support for this assignment. In the AICAR complex, molecule A again contains a bound CFAIR ligand, which is evidently not displaced by substrate. In contrast to the apo structure, however, molecule B of the AICAR complex does contain electron density for a bound ligand. The density is weak and discontinuous, making it difficult to unequivocally identify this ligand or to model it. Because the complex was obtained by soaking AICAR into the substrate-free crystals, in which the cyclohydrolase active site of molecule B was empty, we assume the bound ligand to be AICAR, although only the phosphate can be modeled with confidence.
The CFAIR ligand in molecule A is represented by excellent electron density in both structures and has the same conformation and makes the same interactions in each case (Fig. 3A); when the two structures are superimposed based only on the protein atoms, the root mean square difference in atomic positions for the ligand is only 0.33 Å (24 atoms). The CFAIR phosphate is bound at the N terminus of helix ␣2, with its O2P oxygen hydrogen-bonded to the peptide NH of Ser-42, O3P to the peptide NH of Thr-43 and the Thr-43 hydroxyl group, and O1P to the Ser-42 hydroxyl group and the amino group of Lys-20. A second lysine amino group, of Lys-72, hydrogen bonds to the phosphate ester oxygen O4P and the ribose ring oxygen O4Ј. Hydrogen bonds linking the ribose O2Ј and O3Ј hydroxyls to Asp-131 OD1 and the peptide oxygen of Asn-109, respectively, complete very extensive recognition of the phosphoribose moiety. The substituted imidazole group of CFAIR has two substituents. The 4-carboxyl group sits between the 71-74 loop and the N terminus of helix ␣8 (residues 131-145); one carboxyl oxygen accepts hydrogen bonds from the peptide NH groups of Lys-72 and Thr-73 and the other a hydrogen bond from the peptide NH of Gly-133. The other imidazole substituent is the C5-formylamino group, which hydrogen bonds  a ND indicates not determined due to weak signals or peak overlap through its N5 amino nitrogen to Asp-131 OD1 and through its formyl oxygen O7 to the peptide NH of Ile-132. The CFAIR ligand is effectively buried by the side chain of Tyr-111, which packs over the top, shielding the ligand from the external solvent.
In both structures, differences are seen in the molecule B cyclohydrolase active site, which is empty in the apo structure and only weakly occupied in the AICAR complex. Superposition of the molecule B cyclohydrolase domain onto the CFAIRbound molecule A shows that ligand binding is associated with small movements of 2-3 Å by the sections of polypeptide that surround the ligand, primarily residues 17-21, 40 -45, 70 -73, and 108 -114. These generate a more closed site around the ligand and optimize hydrogen bonding interactions, characteristic of an induced fit. There are also large movements of the Lys-72 and Tyr-111 side chains to enable Lys-72 to hydrogen bond to the ligand and the Tyr-111 phenyl ring to flip over the top of the ligand, sequestering it from solvent. These movements are seen both in the apo structure, where the molecule B site is unliganded, and in the AICAR complex, where the molecule B site appears to contain a weakly bound nucleotide. In the CFAIR-occupied site, Lys-72 NZ and Tyr-111 OH are brought close together (3.8 Å apart in the apo structure and 3.4 Å apart in the AICAR complex), whereas in the molecule B site they are 18 Å apart in the apo structure and 16 Å apart in the AICAR complex. The same movements, of similar magnitude and for the same residues, are associated with ligand binding in the cyclohydrolase domains of both the chicken and human ATIC enzymes (16). It appears that in each case, strong binding in one site is at the expense of weak or no binding in the other.
AICAR Binding-In the AICAR complex, each formyl transfer active site was found to contain a bound AICAR molecule.
The key interactions between each AICAR molecule and the enzyme (Fig. 3B) are strictly conserved in the two sites. The AICAR phosphate group occupies a highly positively charged pocket surrounded by the side chains of Arg-224, Lys-254, Arg-359, and Arg-519, with each of its terminal oxygens making three geometrically favorable hydrogen bonds as follows: OP1 with Asn-259 ND2, Arg-519 NH2, and a water molecule, OP2 with Tyr-225 OH, Arg-519 NH1, and a water molecule, and OP3 with Ser-257 OG, Arg-224 NH2, and a water molecule. The ribose unit adopts a C2Ј-endo pucker conformation. Its hydroxyl O3Ј is hydrogen-bonded to the main chain carbonyl oxygen of Gly-314 and a water and O2Ј to Glu-337 OE2. Whereas the phosphate and ribose interactions are all with one monomer, the 5-aminoimidazole-4-carboxamide moiety makes contact primarily with the opposing monomer of the dimer through its carboxamide group, which is hydrogenbonded to Arg-447 NE and the peptide oxygen of Phe-472. The AICAR-binding site appears to be largely pre-formed, although comparison of the substrate-free and AICAR complex structures shows that the Arg-224 side chain moves ϳ8 Å to hydrogen bond to the AICAR phosphate moiety. The AICAR complex structure also contains a phosphate ion in each formyl transfer active site. This phosphate ion is hydrogen-bonded to the exocyclic 5-amino group of AICAR, the amino group of Lys-283, the side chain amide nitrogen of Asn-427, the peptide nitrogen of Arg-447, and (if the phosphate is protonated) the oxygen of the 4-carboxamide group. No phosphate ion is present in the apo structure at or near this site, despite identical crystallization conditions, indicating AICAR binding generates a favorable binding site for the phosphate.

DISCUSSION
In most organisms, de novo purine biosynthesis is achieved through a series of 10 -12 sequential enzymatic steps, through which the starting compound, PRPP, is converted to the purine nucleoside IMP. The first half of this pathway is dedicated to the generation of an aminoimidazole moiety attached to the ribose, whereas the second half of the pathway adds and modifies appropriate substituents on the 4-and 5-C atoms of the imidazole, leading to the final cyclization reaction that generates IMP. The enzymes that catalyze the reactions in this second half of the pathway must therefore accommodate rather similar substrates, with a common 5Ј-phosphoribosylimidazole core. This makes the problem of identifying adventitiously bound ligands particularly acute.
The bifunctional enzyme ATIC (PurH) catalyzes the final two steps of de novo purine biosynthesis, the penultimate step being the transfer of a formyl group from N 10 -formyltetrahydrofolate to the 5-amino group of the substrate AICAR to give the 5-formylamino product FAICAR. The final step then cyclizes FAICAR to give IMP. The first ATIC structure to be determined, the chicken enzyme, revealed two independent active sites ϳ50 Å apart in distinct structural and functional domains. Surprisingly, an adventitiously bound nucleotide was found in one, but not both, of the two cyclohydrolase domains of the dimeric enzyme (14). This nucleotide was modeled first as GMP and later, after analysis by HPLC and mass spectrometry, as XMP (18). Curiously, however, when the crystals were soaked with XMP, a nucleotide was found in each cyclohydrolase domain but with one molecule (but not the other) having a distinctly nonplanar exocyclic oxygen on one ring (16).
The three-dimensional structure of the ATIC enzyme from MtbATIC, presented here, reveals a similar phenomenon as follows: a nucleotide bound to one, but not both, of the cyclohydrolase domains of this dimeric enzyme. After mass spectrometric analysis, followed by isolation of the nucleotide and characterization by NMR, we have identified it as CFAIR. This has a mass of 366.03 atomic mass units (theoretical and observed) as a singly charged negative ion, which is 3 atomic mass units greater than that of XMP (363.03 atomic mass units). We infer that CFAIR is produced by formylation of an earlier intermediate of the pathway, CAIR, probably by ATIC under the conditions of overexpression in E. coli. The natural substrate for the formyl transfer step catalyzed by ATIC (AICAR) differs from CAIR only in having a 4-carboxamide group in place of a 4-carboxyl group and could certainly be accommodated in the formyltransferase active site.
The CFAIR ligand in molecule A of the apo-MtbATIC structure is represented by excellent continuous electron density (Fig. 3A), with no additional electron density present between the 4-carboxyl and 5-formylamino substituents to suggest that they should be joined into a ring structure as in XMP. Analysis of the related nucleoside FAIRs shows that some rotation about the C5-N5 bond can occur (38). This relieves the intramolecular contact (2.9 Å) between the formyl carbon C7 and the oxygen atoms of the 4-carboxyl group. Significantly, it means that the formyl oxygen, O7, lies 0.3-0.5 Å below the plane of the imidazole ring, displaced toward the peptide nitrogen of Ile-132, from which it accepts a hydrogen bond of 2.8 Å. In the AICAR complex, the electron density for the ligand in the cyclohydrolase domain of molecule A is again consistent with CFAIR, with the same conformation, in which O7 is again 0.3-0.5 Å below the imidazole ring plane and hydrogen-bonded to Ile-132 NH (2.9 Å). Evidently, CFAIR is not displaced from this site by AICAR. Indeed, the bound CFAIR molecule is fully enclosed by the protein structure, with the side chain of Tyr-111 flipped over the top to shield it from solvent and every polar atom involved in at least one hydrogen bond with surrounding protein atoms.
Comparison with the avian ATIC structure is instructive. Although the adventitious ligand found in the cyclohydrolase domain of molecule A of that structure was modeled as XMP, the authors noted that the C2 carbonyl group was unexpectedly bent out of the ring plane (16). They suggested that either the ligand was not XMP or it was distorted in the avian structure by induced fit. The carbonyl oxygen O2 of XMP corresponds structurally to the formylamino oxygen O7 of CFAIR, and superposition of these ligands in the avian and Mtb ATIC structures shows that in both cases the oxygen is out-of-plane to a similar degree, displaced toward the Ile-132 peptide NH (Ile127 in the avian ATIC) at the N terminus of helix ␣8 (residues 131-145 in MtbATIC). If this ligand in the avian apo-ATIC structure was indeed CFAIR, as we suggest, the conformational difference between the two "XMP" ligands in the avian ATIC/XMP/ AICAR structure (Protein Data Bank code 1m9n) would be explained by the "bent" ligand being CFAIR and the planar one being XMP.
Assuming that the adventitious ligand in both the Mtb and avian ATIC structures is CFAIR and is carried through expression, purification, and crystallization, it evidently has high affinity for the cyclohydrolase active site. In the case of the avian enzyme it is not displaced by the 10-fold molar excess of XMP used to produce the avian ATIC/XMP/AICAR crystal structure (15), and in MtbATIC it is not displaced by AICAR. XMP is known to be a micromolar IMP cyclohydrolase (IMPCHase) inhibitor, with an inhibition constant K i ϭ 0.12 M (39), and although CFAIR has not to our knowledge been synthesized or tested, its nonformylated precursor CAIR is a 10 M inhibitor of ATIC (40). These observations suggest that CFAIR is an excellent IMPCHase inhibitor, with a higher affinity than XMP. The affinity for CFAIR presumably arises because it is isostructural with FAICAR, the true IMPCHase substrate, differing only in the substitution of the carboxyamide-NH 2 of FAICAR by the carboxylate oxygen of CFAIR. CFAIR cannot cyclize, however, because cyclization proceeds via nucleophilic attack by the -NH 2 nitrogen on the formyl carbon (40). The interaction of the formyl oxygen of CFAIR with the N terminus of helix ␣8 in the M. tuberculosis and avian ATIC structures also supports the FAICAR-binding mode proposed by Wolan et al. (16) from the avian and human ATIC-XMP complexes.
A persistent feature of the crystal structures of the M. tuberculosis, avian, and human ATIC structures is the preference for ligand binding in only one of the two IMPCHase active sites of the ATIC dimer, an apparent half-the-sites reactivity. This is coupled with active site plasticity, in which sections of the polypeptide surrounding the bound ligand move 2-3 Å inward to enclose it, a tyrosine side chain (Tyr-111 in MtbATIC) flips over the top, and a lysine (Lys-72 in MtbATIC) moves in to interact with both the ligand and the Tyr side chain. In contrast, in those structures where a second ligand occupies the other active site (AICAR for MtbATIC and XMP for avian ATIC), binding in that site appears to be much weaker, as judged by the electron density. The site remains more open, and the Lys and Tyr side chains remain 15-20 Å apart. Consideration of the dimer interface suggests an explanation. The interface between the two IMPCHase domains involves three helices from each domain (␣3 (residues 75-83) and the preceding loop 71-74; ␣8 (residue 131-145); and ␣10 (residue 170 -201); residue numbering as in MtbATIC). These pack as three antiparallel pairs (Fig. 4). The loop 71-74 plays an important role in ligand binding; Lys-72 from this loop moves to engage with the ligand phosphate and ribose moieties and with Tyr-111, and Thr-73 hydrogen bonds to the ligand imidazole nitrogen N3 (through OG1) and the 4-carboxyl (or 4-carboxamide) group. The N terminus of helix ␣8 also contacts the ligand, hydrogen bonding to the 5-formylamino substituent. Thus, although the movements induced by ligand binding are relatively small, they involve four of the six principal components of the dimer interface. Movement of the helices of one monomer, toward the ligand and away from the interface, provides a mechanism whereby binding in the other site could be disfavored.
Just as MtbATIC is a potential target for the design of antituberculosis drugs, so the human ATIC has been explored as an attractive target for developing new anti-tumor agents (8,41). A number of compounds, targeted against either the formyltransferase or cyclohydrolase active site, have already been synthesized and tested for their potency against human ATIC. The sulfonyl antifolate inhibitors BW1540 and BW2315 (7), which target the formyltransferase active site, show striking inhibition in the nanomolar concentration range. Crystallographic analyses suggest that this high potency is driven primarily by the interactions of the sulfonyl group within the oxyanion hole formed by the main chain amides of Ser-450 (Asn-446 in Mtb) and Arg-451 (Arg-447 in Mtb). In the MtbATIC/AICAR structure, a phosphate ion occupies this site, confirming its affinity for anionic oxygen species and suggesting that the sulfonyl compounds developed for human ATIC could also be potent inhibitors of the M. tuberculosis enzyme. Comparison of the M. tuberculosis and human enzymes shows that most of the residues that contribute to the binding and stability of these two compounds in human ATIC are conserved in MtbATIC (Lys-283, Asn-427, Arg-447, Val-448, Asp-477, and Ser-496, M. tuberculosis numbering).
Sulfonyl-based compounds designed to mimic reaction intermediates of the cyclohydrolase reaction have also been developed (9) and inhibit the cyclohydrolase site of avian ATIC at a low micromolar concentration range. These compounds are closely related to XMP and bind in a similar way. Comparisons of their binding mode to the avian ATIC (9) with the binding of CFAIR in MtbATIC show that one of the sulfonyl oxygens binds at the same site as the formyl oxygen of CFAIR, hydrogen-bonded to the N terminus of helix ␣8. Interestingly, this study showed that a nucleoside of the sulfonyl compound bound almost as well as the corresponding nucleotide, with inhibition constants of 0.23 and 0.15 M, respectively. Given the highly favorable interactions that CFAIR forms with MtbATIC, a CFAIR nucleoside could be an effective inhibitor of the M. tuberculosis enzyme and a starting point for the development of more potent inhibitors of this enzyme family.