Crystal Structure of Bacillus subtilis Guanine Deaminase

Guanine deaminase, a key enzyme in the nucleotide metabolism, catalyzes the hydrolytic deamination of guanine into xanthine. The crystal structure of the 156-residue guanine deaminase from Bacillus subtilis has been solved at 1.17-Å resolution. Unexpectedly, the C-terminal segment is swapped to form an intersubunit active site and an intertwined dimer with an extensive interface of 3900 Å2 per monomer. The essential zinc ion is ligated by a water molecule together with His53, Cys83, and Cys86. A transition state analog was modeled into the active site cavity based on the tightly bound imidazole and water molecules, allowing identification of the conserved deamination mechanism and specific substrate recognition by Asp114 and Tyr156′. The closed conformation also reveals that substrate binding seals the active site entrance, which is controlled by the C-terminal tail. Therefore, the domain swapping has not only facilitated the dimerization but has also ensured specific substrate recognition. Finally, a detailed structural comparison of the cytidine deaminase superfamily illustrates the functional versatility of the divergent active sites found in the guanine, cytosine, and cytidine deaminases and suggests putative specific substrate-interacting residues for other members such as dCMP deaminases.

catalyzes the hydrolytic deamination of guanine into xanthine and ammonia, thereby irreversibly removing the guanine base from the pool of guanine-containing metabolites. Our sequence analysis suggests that two types of GDs have evolved separately. Plant, Caenorhabditis, Archaea, and some bacterial GDs belong to the cytidine deaminase (CDA) superfamily (1,2). For example, the Bacillus subtilis GD (bGD) has been shown to be inducible with purines as nitrogen sources (3,4). On the other hand, mammalian, insect, fungal, and some bacterial GDs belong to the TIM barrel metallohydrolase superfamily (5,6). GD is the only identified enzyme in mammals that can directly deaminate a base, and its diagnostic usefulness has been well documented, for example, in the detection of hepatoma and the identification of donor blood infected with the hepatitis C virus (7,8). The expression of mammalian GDs has been shown to be tissue-specific and to fluctuate during development, and various GD activities have been found in cancerous breast and kidney tissues (9 -12). Different human GD isoforms have been identified and shown to modify neurotransmitter receptors at the synaptic sites during neuronal development (13). Thus, GD may have the potential to be an attractive candidate for the study of applications, including drug design and diagnostics.
In addition to GDs mentioned above, the purine/pyrimidine deaminases in the CDA superfamily include CDAs, fungal cytosine deaminases, dCMP deaminases (dCMPDs), riboflavin biosynthesis proteins (RibGs), and RNA editing enzymes containing either an adenosine deaminase or a cytidine deaminase domain (1, 2, 14 -17). These deaminases catalyze the zincassisted conversion of the amino group of the cytosine, guanine, or adenine moiety into a keto group. The substrates in the CDA superfamily are made up of similar building blocks, namely base, ribose, and phosphate ( Fig. 1). A major challenge is to understand how nature has evolved the CDA fold of these various deaminases to act on their substrates. Two questions may be posed. First, are similar or dissimilar residues used to interact with the common moieties of the substrates? Second, can the specific substrate-interacting residues for each deaminase be predicted through combinations of structural and bioinformatics methods? To gain structural insights into the substrate specificity and evolution of bGD, we have solved the enzyme structure at 1.17-Å resolution.

EXPERIMENTAL PROCEDURES
The protein crystallization, diffraction data collection, phase determination using selenium multiwavelength anomalous detection, and the building of an initial dimeric model have been described in our preliminary report (18). The structure then underwent straightforward refinement against data to 1.17-Å resolution using CNS (19). Statistics for the refined model are shown in Table I. More than 92% of the residues are in the most favored regions of Ramachandran plot, with * This study was supported by National Science Council Grant NSC 92-2320-B-010-071. The synchrotron radiation experiments were performed at the National Synchrotron Radiation Research Center, Taiwan, the Photon Factory, Tsukuba, Japan, and the SPring-8, Sayo, Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
To identify the putative substrate recognition residues for some members, a comparative analysis of the available crystal structures in the CDA superfamily was first carried out, and then sequence similarity searches were conducted by PSI-BLAST (24). Multiple sequence alignments of 25 and 32 homologous sequences for dCMPD and RibG, respectively, were performed by ClustalW (25). This was followed by manual editing according to the structural information and secondary structure prediction using PSI-PRED (26). Finally, the conserved residues in each family were mapped on the known structures to reveal whether they are potentially localized nearby the active site cavity.

RESULTS AND DISCUSSION
The Overall Structure-Analytical ultracentrifugation experiments demonstrated that the enzyme exists in solution as a homodimer. In the crystal there is one dimer per asymmetric unit, and there are no significant differences between the two subunits except for the N-terminal seven residues and the side chains of several arginines and lysines (root mean square deviation of 0.82 Å for all protein atoms of residues 8 -156). The current model contains two extra residues (His (Ϫ2) and Ala (Ϫ1)), residues 1-156 from subunit A, and residues 2Ј-156Ј from subunit B.
Unexpectedly, bGD forms an intertwined dimer through Cterminal domain swapping (Fig. 2). The protein structure consists of a central five-stranded ␤-sheet (␤1-␤4 and ␤5Ј of the adjacent subunit) with the strand order 2, 1, 3, 4, and 5Ј and with ␤1 running antiparallel to other strands. The ␤-sheet is sandwiched by helices ␣A, ␣D 1 , and ␣EЈ on one side and helices ␣B, ␣C, and ␣D 2 on the other side. The helices ␣D 1 and ␣D 2 extend away from the subunit core and wrap around the adjacent subunit, leading to the C-terminal segment, residues 123-156, which is swapped. Consequently, the ␤5 strand of one chain forms hydrogen bonds to the ␤4 strand of the other in a parallel manner, thereby completing the five-stranded ␤-sheet of the CDA fold. The swapping results in a very extensive intersubunit interface that buries 3900 Å 2 of the 10200-Å 2 molecular surface area per monomer.
More than 45 residues from each subunit, including many conserved residues in the GD family, are involved in dimer formation. The dimer is stabilized mainly by the formation of the wide interhelical hydrophobic packing of helices ␣A and ␣D 1 with ␣EЈ and the packing of ␣B, ␣C, and ␣D 2 with ␣BЈ, ␣CЈ, and ␣D 2 Ј. In addition, there are 26 direct hydrogen bonds between the protein atoms, including salt bridges between Lys 8 and Glu 139 , Asp 49 and Arg 60 , and Arg 94 and Asp 113 . Interestingly, the side chains of the conserved Arg 60 from both subunits stack very well, with a distance of 3.6 Å between the guanidino groups.
The Active Site Architecture and Substrate Binding-The active site contains one tightly bound metal ion, even though no metal ions were added either during protein purification or crystallization. The metal ion was identified by a very strong spherical electron density in the Fourier map and was assigned as zinc based on zinc anomalous data (18). The zinc ion is tetrahedrally coordinated by His 53 N ␦1 (2.06 Å), Cys 83 S ␥ (2.34 Å), Cys 86 S ␥ (2.28 Å), and a water molecule, WAT1 O (2.03 Å) (Fig. 3A).
There was a significant electron density peak for a five-atom aromatic ring in the 2F o Ϫ F c and F o Ϫ F c electron density maps  (2) 6.8 a Values in parentheses are for the highest resolution shell.
FIG. 1. Purine/pyrimidine substrates recognized by members of the CDA superfamily except for DHX, a transition-state analogue for GD. The target amino group is printed in bold and italics. (Fig. 3A). The signal was assigned as an imidazole from elution off the nickel column, because it occupies the putative position of the imidazole group of the substrate guanine with extensive interactions (Fig. 3). The N 1 atom interacts with Asn 42 N ␦2 (3.29 Å) and Tyr 156Ј O (2.90 Å), and the N 3 atom contacts with Asp 114 O ␦1 (2.86 Å). The imidazole ring of His 53 and the phenyl ring of Phe 26 stack on the imidazole ring with an interplanar distance of 3.3-3.4 Å. The C 2 atom also makes close contacts with Trp 92Ј C 2 (3.63 Å), Phe 112 C ␦1 (3.89 Å), and Tyr 156Ј C ⑀1 (3.52 Å). Three strong peaks in the active site were assigned as water molecules (Fig. 3A) (27).
On the basis of the tightly bound WAT1, WAT2, and imidazole molecules described above, the transition state analogue 1, 2-dihydroxanthine (DHX) was modeled into the active site ( Figs. 1 and 3). Hypoxanthine is expected to be converted by bGD into DHX, because similar inhibitors have been used in previous crystallographic studies of Escherichia coli CDA (eCDA) and yeast cytosine deaminase (yCD) (1,2). After modeling, energy minimization was performed by CNS as a structural refinement. The nucleophilic OH 2 group of the purine ring coordinates to the catalytic zinc ion (2.05 Å) and interacts with Glu 55 O ⑀2 (2.58 Å) and Cys 83 N (2.98 Å). Additionally, the N 1 atom makes close contact with Glu 55 O ⑀1 (2.73 Å), the N 3 atom with Asp 114 O ␦2 (2.84 Å), the O 6 atom with Ala 54 N (3.11 Å) and Asn 42 N ␦2 (2.96 Å), the N 7 atom with Asn 42 N ␦2 (3.29 Å) and Tyr 156Ј O (2.87 Å), and the N 9 atom contacts with Asp 114 O ␦1 (2.82 Å). In addition to the extensive hydrogen bond network, the imidazole ring of His 53 and the phenyl ring of Phe 26 stack on the purine ring with an interplanar distance of 3.3-3.4 Å. Also, the C 8 atom of the purine ring makes close contacts with Trp 92Ј C 2 (3.65 Å), Phe 112 C ␦1 (3.84 Å), and Tyr 156Ј C ⑀1 (3.50 Å). The hydrogen bonds between the substrate and the conserved residues are virtually identical to those found in yCD and CDAs (1,2,28).
Structural Conservation in the CDA Superfamily-A structural similarity search by DALI (29) reveals that bGD displays significant structural similarity to yCD, B. subtilis CDA (bCDA), eCDA, and subdomain 2 of the chicken AICAR transformylase domain, with Z-scores of 14.2, 9.9, 8.9, and 8.1, respectively (1,2,28,30). Two classes of CDAs have been identified: (i) a dimeric and pseudo-tetrameric form such as eCDA that utilizes one histidine and two cysteines for zinc ion coordination; and (ii) a tetrameric form such as bCDA that uses three cysteines instead. Detailed structural comparisons reveal that the conserved structural elements in the CDA fold include the strands ␤1-␤5 and the helices ␣A-␣C (Fig. 4A). The main chain atoms of the 65-70 structurally equivalent residues overlay with a root mean square deviation of 1-1.35 Å and 8 -24% sequence identity. The strong conservation of the tertiary structures of these domains suggests that they are evolutionarily descended from a common structural fold, the CDA fold. To date, enzymes of this CDA fold exist as an oligomer and lack disulfide bonds. Similarly, the active site cavity in the CDA superfamily is mainly made up of the ␣A-␤1, ␤2-␣B, ␤3-␣C, and ␤4-␣D loops and the C-terminal tails.
The dimeric yCD has an intramolecular active site (2), whereas the dimeric bGD contains an intersubunit one. In contrast, the active site of the tetrameric bCDA is built from three subunits (28). The active sites of bGD, yCD, eCDA, and bCDA share virtually identical interaction networks between the attacking water molecule, the zinc ion, the three protein ligands, the base glutamate, and the common moiety of the pyrimidine ring (Fig. 4B). The absolute conservation of a proline residue prior to the conserved cysteine in the CDA superfamily (Pro 82 in bGD) may be due to the fixation in orientation of the backbone O (Glu 81 O in bGD) for interaction with both the amino group and the leaving ammonium molecule (27). Thus, the presence of the conserved signatures HXE (or CXE) and PCXXC in the CDA superfamily indicates a similar zincassisted deamination mechanism.
Therefore, on the basis of the structural studies on bGD, yCD, and CDAs (2, 27), a deamination mechanism for bGD is proposed as outlined in Scheme 1. The nucleobase guanine binds to the active site, and its direct contacts with the Cterminal tail induces the tail to cover the active site entrance, sequestering the reaction from solvent. Glu 55 serves as a proton shuttle, abstracting a proton from the zinc-activated water to form the attacking hydroxide ion on the one hand and, on the other hand, protonating the N 1 of guanine to form the tetrahe- The active site residues and the imidazole are shown as ball-and-stick representations, and the modeled inhibitor (DHX) as magenta sticks. The zinc ion and the water molecules as magenta and red spheres, respectively. B, stereo view of the interaction networks in the active site. There are nine direct hydrogen bonds between the protein molecule and the inhibitor (see "Results and Discussion" for a detailed explanation). C, molecular surfaces of one bGD subunit are colored for electrostatic potential from Ϫ10 k B T (red) to 10 k B T (blue), whereas the surfaces of the other subunit are displayed explicitly as worms. The zinc ion is embedded at the deepest part, whereas the inhibitor lies near the cavity opening (Fig. 3C was created with a similar orientation to Fig. 2B). Single letter amino acid abbreviations are used with position numbers. dral intermediate. Subsequently, Glu 55 also assists the proton transfer from OH 2 to NH 2 2 to facilitate the cleavage of the carbon-nitrogen bond. The newly formed xanthine may move toward the zinc ion for ligation, and this would weaken its interaction with the C-terminal tail, allowing its release from the active site.  (AICAR2 and AICAR4). The secondary structure elements for each enzyme, derived from the crystal structures or predicted by PSI-PRED, are boxed, and those for bGD are labeled (s.s). Residues conserved in the CDA superfamily and involved in deamination are shaded in cyan, whereas the residues for the conserved hydrophobic core are shaded in yellow. In addition, the residues conserved in each member and involved in the substrate binding are shaded in red, whereas the residues that form hydrogen bonds with protein atoms for structural integrity are shaded in blue. The conserved glycine and proline residues in each family are shaded in gray. The large insertion (residues 470 -532, subdomain 3) in AICAR4 is indicated by ***, is not involved in the catalytic activity, and is present in the eukaryotic enzymes only. Single letter amino acid abbreviations are used with position numbers.
Structural Divergence in the CDA Superfamily-One of the major variations is the state of protein oligomerization. The dimeric bGD cannot be superimposed on the dimeric yCD and tetrameric bCDA because of the different relative orientations between the subunits. However, all of these deaminases utilize the helical layer(s) and the C-terminal tail for oligomerization. Only one helical layer contributes to the interface in CDAs and yCD, whereas both helical layers are used in bGD because of the domain swapping.
A second variation is the C-terminal segment beyond strand ␤4, which is hypervariable and contributes to substrate recognition through a switching of the active site entrance and/or direct interaction. To date, all of the available structures are in a closed conformation. In bGD, yCD, and bCDA, the C-terminal tail forms a "flap" across the entrance that regulates both substrate binding and product release (2, 28) (Fig. 3C). However, the active site entrance switch in eCDA is more complicated, perhaps acting through three loops from the adjacent monomer, and remains unclear (1). The C-terminal tails in both yCD and bGD contain conserved substrate recognition residues such as Asp 155 in yCD and Tyr 156 in bGD. These residues seal the active site entrance upon substrate binding and thus limit the size of the substrate binding pocket for the nucleobases. On the other hand, the highly flexible C-terminal tail of bCDA would seem to enlarge the active site cavity to accommodate the larger nucleoside substrate (2,28). Neither cytidine nor guanine can be accommodated within the active site of the yCD because of steric hindrance from Asp 155 (Fig.  4B). Consistent with this observation, our enzymatic assay shows that yCD cannot catalyze the deamination of either cytidine or guanine. Similarly, cytidine is not a substrate for bGD, because there is a steric clash with Tyr 156 . Therefore, the diverse C-terminal tails might make a major contribution to the structural plasticity and functional diversity of the CDA superfamily.
In addition to the diverse C-terminal segment, each family member ought to contain some other unique substrate-recognition residues such as Glu 44 in bCDA and Asp 114 in bGD. Interestingly, the conserved asparagine residue at the C-terminal end of the ␤2 strand in the CDA superfamily seems to form different interactions with the various substrates (Fig.  4B). It interacts with the OH 3Ј of the ribose of cytidine in CDAs, the O 2 of cytosine in yCD, and the O 6 and N 7 of guanine in bGD. The imidazole rings of the conserved histidines for zinc ligation, His 64 in yCD and His 55 in bGD, stack on the base ring, whereas the equivalent residue in eCDA does not. The hydrophobic edge of the nucleobase ring faces a hydrophobic region that includes an aromatic cluster. For example, these clusters are Phe 71 , Tyr 126 and Phe 165Ј in eCDA, Phe 24 and Phe 126Ј in bCDA, Phe 114 and Trp 152 in yCD, and Trp 92Ј , Phe 112 , and Tyr 156 in bGD (1, 2, 28) (Fig. 3B).
Prediction of the Specific Substrate-interacting Residues in dCMPD and RibG-Comparative analysis of bGD, yCD, and CDAs reveals that the residues conserved in the superfamily are responsible for the general deamination mechanism such as H(C)XE and PCXXC, as well as those needed for structural maintenance, e.g. the hydrophobic packing between the five ␤-strands (Fig. 4C). Other residues that are conserved in the family are involved in substrate specificity, such as Asp 114 in bGD, and yet others are required for structural integrity such as oligomerization, e.g. the intersubunit hydrophobic contacts and salt bridges. A large number of the conserved residues are responsible for the structural plasticity of the CDA superfamily, whereas only a small number of residues are directly involved in its functional versatility.
The dUMP produced by dCMPD is the major substrate source for thymidylate synthase during the de novo synthesis of thymidine nucleotides, and, clinically the levels of this enzyme have been found to be elevated in the serum of patients in various disease states (14). This deaminase is also involved in the action as well as the metabolism of various antiviral and antitumor agents (31). If the human dCMPD is considered, the specific dCMP binding residues are predicted to be Lys 115 and Gln 119 , involved in forming the adjacent active site with respect to the corresponding Gln 91 in bCDA and Trp 92 in bGD. The third residue would seem to be Tyr 132 , because of its corresponding Phe 114 in yCD and Asp 114 in bGD (Fig. 4C). On the other hand, the conserved Tyr 53 in the strand ␤2 might form a hydrogen bond with Ser 30 in ␣A to allow structural stabilization, because the corresponding conserved His 50 in yCD interacts with Glu 21 and Glu 28 . The highly conserved Asn 88 in the ␣B helix, Asn 108 in the T4-phage enzyme, may be involved in hexamer formation, allowing allosteric regulation by dCTP and dTTP. This is because the mutants F112A and R115E (Lys 93 in human dCMPD) in the phage exist only as a dimer with a loss of allosteric regulation (32). In addition, all conserved residues in the ␣B helix in bGD, yCD, and bCDA, except for the signature HXE, are involved in structural integrity.
Riboflavin synthesis from GTP is essential for basic metabolic functions of gastric pathogens such as Helicobacter pylori, and the involved genes (RibA-RibG) may be targets for an antibacterial drug. Most RibG proteins have both deaminase and reductase activities and are involved in the second and third steps of riboflavin biosynthesis. A deletion analysis of B. subtilis RibG (bRibG) has demonstrated that the N-terminal 147 and the C-terminal 248 residues are sufficient for deaminase and reductase activity, respectively (15). In the bRibG deaminase domain, the specific substrate binding residues are predicted to be Asn 23 , His 76 , and Lys 79 in the unique insertion between the two conserved zinc ligand cysteines and Asp 101 . Obviously, CDA, dCMPD, and RibG utilize different residues to interact with the common ribose and phosphate groups of their substrates (Fig. 4C).
More importantly, RNA editing proteins contain either a cytidine deaminase or an adenosine deaminase domain (16,17). Prediction of the functional residues in RNA editing enzymes is still under investigation. Accurate prediction would provide a structural basis for mutational analysis to elucidate SCHEME 1. The proposed catalytic mechanism for bGD. the functional roles of the critical residues.
The Function and Mechanism of Domain Swapping in bGD-Liu and Eisenberg defined domain swapping as the exchange of an identical structural segment, and this exchange has been proposed as a mechanism for protein oligomerization and for the assembly of protein aggregates such as amyloid formation (33). Our bGD structure, the first identified domainswapped example in the CDA superfamily, provides some new insights and allows a greater understanding of the functions and mechanisms involved in domain swapping. Circular dichroism measurements revealed the melting temperature, T m , to be an estimated 62°C for bGD and ϳ50°C for yCD, implying that the domain swapping in bGD enhances structural stability. Based on the yCD structure, the analogous dimeric bGD model without domain swapping was constructed before structural determination. The unswapped bGD model shows Tyr 156 occupying the Asp 114 position and Asp 114 located far away from the active site. This suggests that the domain swapping in GD recruits the specifically required guanine-interacting residues into the active site. Therefore, domain swapping in GD contributes not only to oligomerization and structural stability but also to substrate specificity.
The four residues between the strands ␤4 and ␤5 in bCDA form a turn, which leads to ␤4 and ␤5 becoming antiparallel, and the C-terminal tail then completes the active site of the adjacent molecule (Fig. 4A). The 17 residues in yCD form an extra ␣D, causing the strands ␤4 and ␤5 to be parallel and the C-terminal tail to fold back toward the active site within the same subunit. On the other hand, the 27 residues in bGD that form the extra helices ␣D 1 and ␣D 2 extend away from the structural core and wrap around the adjacent subunit, resulting in a domain swap. In other proteins, the shortening of hinge loops has been shown to induce formation of domain swapping. For example, deletion of six residues from the hinge loop in staphylococcal nuclease causes the formation of domainswapped dimers (34). It is not clear how a composite active site created in trans might favor guanine deamination. Nonetheless, bCDA, yCD, and bGD may provide an elegant understanding of underlying mechanisms that lead to the domain swapping.
Conclusion-In summary, the members of the CDA superfamily share a conserved structural core as well as the residues required for the deamination mechanism. However, each member has its unique substrate recognition mechanism, and these mechanisms include the dissimilar substrate-interacting residues and various active site sizes, which are mainly controlled by the diverse C-terminal tail. In each member of the family, a large portion of conserved residues are responsible for the structural plasticity that is needed for functional versatility, and only a small portion of these residues are directly involved in catalysis. Finally, the CDA superfamily may provide an elegant system to study the domain swapping mechanism.