New Family of Deamination Repair Enzymes in Uracil-DNA Glycosylase Superfamily*

DNA glycosylases play a major role in the repair of deaminated DNA damage. Previous investigations identified five families within the uracil-DNA glycosylase (UDG) superfamily. All enzymes within the superfamily studied thus far exhibit uracil-DNA glycosylase activity. Here we identify a new class of DNA glycosylases in the UDG superfamily that lacks UDG activity. Instead, these enzymes act as hypoxanthine-DNA glycosylases in vitro and in vivo. Molecular modeling and structure-guided mutational analysis allowed us to identify a unique catalytic center in this class of DNA glycosylases. Based on unprecedented biochemical properties and phylogenetic analysis, we propose this new class of DNA repair glycosylases that exists in bacteria, archaea, and eukaryotes as family 6 and designate it as the hypoxanthine-DNA glycosylase family. This study demonstrates the structural evolvability that underlies substrate specificity and catalytic flexibility in the evolution of enzymatic function.

DNA glycosylases play a major role in the repair of deaminated DNA damage. Previous investigations identified five families within the uracil-DNA glycosylase (UDG) superfamily. All enzymes within the superfamily studied thus far exhibit uracil-DNA glycosylase activity. Here we identify a new class of DNA glycosylases in the UDG superfamily that lacks UDG activity. Instead, these enzymes act as hypoxanthine-DNA glycosylases in vitro and in vivo. Molecular modeling and structure-guided mutational analysis allowed us to identify a unique catalytic center in this class of DNA glycosylases. Based on unprecedented biochemical properties and phylogenetic analysis, we propose this new class of DNA repair glycosylases that exists in bacteria, archaea, and eukaryotes as family 6 and designate it as the hypoxanthine-DNA glycosylase family. This study demonstrates the structural evolvability that underlies substrate specificity and catalytic flexibility in the evolution of enzymatic function.
DNA is subject to chemical damage under normal and stressed biological conditions (1). A common type of DNA base damage involves deamination of cytosine to uracil, adenine to hypoxanthine, and guanine to xanthine and oxanine. DNA glycosylases play a major role in the repair of deaminated DNA damage (2). The uracil-DNA glycosylase (UDG) 2 superfamily consists of five families based on conserved motifs and structural similarity (3). Structurally, they are organized by a fourstranded ␤-sheet surrounded by ␣-helices, whereas functionally, all of the DNA glycosylases within the superfamily studied thus far are proven biochemically to be uracil-DNA glycosylases. Due to the need for removing deaminated cytosine from the genome, almost all organisms contain at least one uracil-DNA glycosylase. Uracil N-glycosylases (UNGs) are family 1 UDGs first discovered in Escherichia coli (4). All characterized UNGs show exquisite specificity toward uracil in both doublestranded and single-stranded DNA (5). Family 2 enzymes are represented by human thymine-DNA glycosylase (TDG), E. coli mismatch-specific uracil-DNA glycosylase (MUG), and a broad substrate specificity TDG from the fission yeast Schizosaccharomyces pombe. This family exhibits more sequence and functional diversity than family 1. The family 3 single strandselective monofunctional uracil-DNA glycosylase (SMUG1) enzymes are considered hybrids with a MUG/TDG-like motif 1 and a UNG-like motif 2 and are active as uracil-DNA glycosylases and xanthine DNA glycosylases (6,7). Although the prokaryotic family 4 UDG enzymes act on both single-stranded and double-stranded uracil-containing DNA, family 5 UDGs are found to be active toward G/U and to a lesser degree toward T/I substrates. Regardless of which family an enzyme belongs to, a common biochemical feature is that they all have uracil-DNA glycosylase activity.
In the course of studying deaminated DNA repair, we identified a new group of DNA glycosylases in the UDG superfamily. Unlike all previously known families, enzymes from this new family possess no uracil-DNA glycosylase activity and instead exhibit repair activity toward hypoxanthine, a deamination product of adenine. The catalytic center is located on a completely conserved asparagine situated in a distinct orientation relative to the glycosidic bond. Based on the distinct repair capacity and active site architecture, we propose this class of repair enzymes as family 6 in the UDG superfamily. The discovery of this new family enzyme, which exists in all three domains of life, underlies the functional and catalytic diversity in the UDG superfamily and has implications for the evolution of repair enzymes.
Cloning, Expression, and Purification of M. barkeri (Mba) DNA Glycosylase-The M. barkeri DNA glycosylase gene (GenBank TM accession number YP304295.1) was amplified by PCR using the forward primer Mba.F (5Ј-GGGAATTCCATA-TGAAAAAACAAGGTTTCCCACCAGTCATT-3Ј; the NdeI site is underlined) and the reverse primer Mba.R (5Ј-CCGCT-CGAGACGCTTGAAAGCTTCTGCCCATCCCGATTT-3Ј; the XhoI site is underlined). The PCR mixture (50 l) consisted of 8 ng of Mba genomic DNA, 200 nM forward primer and reverse primer, 1ϫ Taq PCR buffer (New England Biolabs), 200 M each dNTP, and 5 units of Taq DNA polymerase (New England Biolabs). The PCR procedure included a predenaturation step at 94°C for 3 min; 30 cycles of three-step amplification with each cycle consisting of denaturation at 94°C for 40 s, annealing at 60°C for 40 s, and extension at 72°C for 1 min; and a final extension step at 72°C for 10 min. The PCR product was purified with the Gene Clean 2 kit (Qbiogene). Purified PCR product and plasmid pET21a were digested with NdeI and XhoI followed by purification of gene fragments with the Gene Clean 2 kit and ligation according to the manufacturer's instruction manual. The ligation mixture was transformed into E. coli strain JM109 competent cells prepared by electroporation (10). The sequence of the Mba DNA glycosylase gene in the resulting plasmid (pET21a-Mba) was confirmed by DNA sequencing.
To express the C-terminal His 6 -tagged Mba glycosylase, the plasmid pET21a-Mba was transformed into E. coli strain BH214 (ung Ϫ mug Ϫ ) by electroporation (10). An overnight E. coli culture was diluted 100-fold into Luria-Bertani (LB) medium supplemented with 100 g/ml ampicillin. The E. coli cells were grown at 37°C while being shaken at 250 rpm until the optical density at 600 nm reached ϳ0.6. After adding isopropyl 1-thio-␤-D-galactopyranoside to a final concentration of 1 mM, the culture was grown at room temperature for an additional 16 h. The cells were collected by centrifugation at 4,000 rpm at 4°C and washed once with precooled sonication buffer.
To purify the Mba DNA glycosylase protein, bacterial cells from a 500-ml culture grown to late exponential phase were harvested by centrifugation at 4,000 rpm for 10 min. The cell pellet was suspended in 7 ml of lysis buffer (20 mM Tris-HCl (pH 7.5), 1 mM EDTA (pH 8.0), 0.1 mM DTT, 0.15 mM PMSF, and 50 mM NaCl) and followed by sonication at output 5 for 3 ϫ 1 min with a 5-min rest on ice between intervals using a Sonifier Cell Disruptor 350 (Branson). The lysate was clarified by centrifugation at 12,000 rpm for 20 min and filtered through a 25-mm GD/X syringe filter (Whatman). The supernatant was transferred into a fresh tube and loaded onto a 1-ml HiTrap chelating column (GE Healthcare). The column was washed with 5 ml of chelating buffer A (20 mM sodium phosphate (pH 7.4), 500 mM NaCl, and 2 mM imidazole). The bound protein in the column was eluted with a linear gradient of 0 -100% chelating buffer B (chelating buffer A and 500 mM imidazole).
Fractions of the eluate were analyzed by 12% SDS-PAGE, and those fractions containing Mba glycosylase protein (40% chelating buffer B) were pooled. The partially purified protein was then loaded onto a 1-ml HiTrap SP column, washed with 5 ml of HiTrap SP buffer A (20 mM HEPES (pH 8.0), 1 mM EDTA, and 0.1 mM DTT), and eluted with a linear gradient of 0 -100% HiTrap SP buffer B (HiTrap SP buffer A and 1 M NaCl). Frac-tions containing Mba glycosylase (30 -50% HiTrap SP buffer B) were pooled and concentrated through Microcon YM 10 (Millipore). The protein concentration was quantified by SDS-PAGE analysis using bovine serum albumin as a standard. The Mba glycosylase protein was stored in aliquots at Ϫ80°C. Prior to use, the protein was diluted in an equal volume of 2ϫ storage buffer (20 mM Tris-HCl (pH 8.0), 2 mM DTT, 2 mM EDTA, 400 g/ml BSA, and 100% glycerol).
Cloning, Expression, and Purification of M. acetivorans DNA Glycosylase-The M. acetivorans DNA glycosylase gene (GenBank accession number NP615428.1) was amplified by PCR using the forward primer Mac.F (5Ј-GGGAATTCCATA-TGATAAAGCGAGGTTTTCCTGCAGTCCTT-3Ј; the NdeI site is underlined) and the reverse primer Mac.R (5Ј-CCGCT-CGAGATGCCTGAAAACAGCCTCCCACTCCGATTC-3Ј; the XhoI site is underlined). The cloning, expression, and purification steps were the same as described above.
DNA Glycosylase Activity Assay-The preparation of fluorescently labeled deaminated base-containing oligodeoxynucleotide substrates was described previously (12). DNA glycosylase cleavage assays were performed at 37°C for 60 min in a 10-l reaction mixture containing 10 nM oligonucleotide substrate, 100 nM glycosylase protein, 20 mM Tris-HCl (pH 7.5), 5 mM EDTA, and 2 mM 2-mercaptoethanol. The resulting abasic sites were cleaved by incubation at 95°C for 5 min after adding 1 l of 1 N NaOH. Reactions were quenched by addition of an equal volume of GeneScan stop buffer. Samples (4 l) were loaded onto a 7 M urea 10% denaturing polyacrylamide GeneScan gel (acrylamide:bisacrylamide ϭ 19:1; 1ϫ TBE buffer (89 mM Tris, 89 mM boric acid, and 2 mM EDTA). Electrophoresis was conducted at 1,500 V for 1.5 h using an ABI 377 sequencer (Applied Biosystems). Cleavage products and remaining substrates were quantified using GeneScan analysis software.
Reversion of lacZ Gene in E. coli CC106-The plasmids pET21a-Mba and pET21a-Mba-N39A were digested with EcoRI and XhoI, and the fragment containing the Mba DNA glycosylase gene was cloned to pBluescript SK(ϩ) to generate pBS-Mba and pBS-Mba-N39A. The resulting plasmids were transformed to E. coli BW1506 (CC106 nfi-1::cat). Tester cultures inoculated as a single colony were grown in LB medium at 37°C for 16 h. Overnight cultures (1 ml) were transferred to 4 ml of fresh LB medium and incubated at 37°C for 4 -5 h until A 600 reached 0.6. After adding isopropyl 1-thio-␤-D-galactopyranoside to a final concentration of 1 mM, the cultures were incubated at 37°C for an additional 5 h. E. coli cells (2 ϫ 10 9 ) were incubated with 30 l of 1 M NaNO 2 (30 mM final concentration) and 970 l of 100 mM sodium acetate buffer (pH 4.6) at 37°C for 30 min. Top agar consisting of 0.5% NaCl, 0.6% agar (Difco), and 0.2 mg/ml nutrient broth (Difco) was autoclaved and then maintained at 45°C in a water bath. After centrifugation, the treated cells were suspended with 1 ml of 10 mM MgSO 4 . Molten top agar prepared as described above (8 ml) was added to the suspended cells, and the mixtures were immediately overlaid onto a minimum lactose plate prepared according to the recipe described previously (13). Cells were incubated at 37°C for 4 days to allow a few cycles of cell division to fix the mutations in the presence of nutrient broth (14).
Molecular Modeling-Pairwise alignment of the amino acid sequence from the Mba glycosylase (GenBank accession number YP_304295.1) and chain A of the 2rba Protein Data Bank structure (human TDG) was performed using ClustalW. This alignment resulted in 19% identity between the two sequences (supplemental Fig. S1). Based on the sequence alignment and the 2rba Protein Data Bank structure, a homology model was constructed for the Mba enzyme using the NEST program (15). The NEST program utilizes an artificial evolution-based algorithm to generate a homology model of a protein based on an input amino acid sequence alignment and corresponding template structure. Default settings were used to develop the model structure of the Mba glycosylase including no tuning or adjustment of the sequence alignment, a short minimization to remove clashes between atoms, and a single round of structural refinement.
Phylogenetic Analysis-The phylogenetic tree was generated with a neighbor-joining algorithm within the MEGA 5 software package applied to a multiple sequence alignment produced with the ClustalW program. The parameters for pairwise alignment were as follows: gap opening penalty, 10; gap extension penalty, 0.1. The parameters for multiple alignment were as follows: gap opening penalty, 10; gap extension penalty, 0.2. Other parameters were: protein weight matrix, Blosum; residue-specific penalties, on; hydrophilic penalties, on; gap separation distance, 4; end gap separation, off.

RESULTS
Deaminated Base Repair Activities-Family 2 enzymes (MUG/TDG) in the UDG superfamily were previously found only in bacteria and eukaryotic systems. In a search (Psi BLAST) for uracil DNA repair enzymes in archaea, we found a hypothetical protein (GenBank accession number YP_ 502486.1) in the archaea Methanospirillum hungatei that showed an insignificant E-value (E ϭ 0.41) with E. coli MUG. However, querying GenBank with this protein led to the identification of a large number of genes in archaea, eubacteria, and eukaryotes with significant homology to UDG superfamily enzymes. Alignment of proteins from this family to the UDG family indicated significant sequence divergence in active site motifs 1 and 2 (Fig. 1A). Most notably, the position equivalent to the catalytic residue in E. coli UNG (family 1; Asp-64), human TDG (family 2; Asn-140), and E. coli MUG (family 2; Asn-18) is occupied by a leucine residue at position 22. To investigate the repair activities of this class of enzymes, we expressed the homologous gene from Mba in an E. coli strain lacking both ung and mug and purified the protein to homogeneity. Interestingly, the recombinant protein completely lacked any detectable uracil-DNA glycosylase activity on all four uracil-containing base pairs and single-stranded uracil-containing DNA (Fig. 1B). Alternatively, hypoxanthine-DNA glycosylase activity was found in all four double-stranded substrates (Fig.  1B). Some minor xanthine DNA glycosylase activity was also detected in this Mba enzyme, but no oxanine DNA glycosylase activity was detected (Fig. 1B). To verify that the lack of UDG activity and existence of HDG activity are not unique to the Mba enzyme, we also investigated the deaminated DNA glycosylase activity in the homologous gene from M. acetivorans. A similar glycosylase pattern was observed (supplemental Fig.  S2), indicating that this is a common repair function in this class of enzymes.
Kinetic Analysis and Catalytic Center-Kinetic analysis showed that this glycosylase was most active on the G/I base pair with an apparent rate constant of 0.085 min Ϫ1 followed by T/I, A/I, and C/I base pairs (Table 1). No activity was detected on single-stranded inosine-containing DNA (Fig. 1B). The catalytic mechanisms of enzymes from families 1 and 2 have been extensively studied. Asp-64 in E. coli UNG has been identified as the key catalytic residue that activates a water molecule for subsequent nucleophilic attack on the anomeric carbon (16). In E. coli MUG, Asn-18 utilizes the main chain and side chain oxygen to activate the water and the main chain amino group to interact with uracil (17). However, the residue (Asp or Asn) that can perform the catalytic function is notably missing in the Mba enzyme. Instead, the equivalent position is occupied by a hydrophobic leucine residue at position 22 ( Fig. 1A and supplemental Fig. S3). We constructed a model of the Mba enzyme based on the crystal structure of human TDG because they share moderate sequence homology ( Fig. 2A) (18). The availability of the apurinic/apyrimidinic site in the human TDG structure allowed us to approximate the distances of amino acid residues to the anomeric carbon. The modeled structure is further confirmed by modeling of the Mba DNA glycosylase sequence based on a recently solved NMR structure of the homologous M. acetivorans DNA glycosylase (supplemental Fig. S4). Within a 10-Å radius of the apurinic/apyrimidinic site, the following potential catalytic residues were identified: Asn-39 (7.08 Å), Asn-84 (9.89 Å), Asn-113 (8.62 Å), Asp-74 (5.55 Å), Asp-25 (9.46 Å), Asp-86 (8.60 Å), Glu-82 (3.05 Å), and Glu-85 (8.28 Å). Among the eight residues identified, only Asn-39 is completely conserved in the Mba enzyme and its homologs (supplemental Fig. S3). Asp-74, Asp-86, and Asn-113 are highly conserved. To assess the role of these residues in catalysis, Asp-74, Asp-86, and Asn-113 were substituted by either Ala or Asn. A significant portion of HDG activity on the G/I substrate was retained, suggesting that Asp-74, Asp-86, and Asn-113 did not play a key role in catalysis (Table 1). Given the completely conserved nature, Asn-39 was substituted with Ala, Asp, and Gln. The Ala substitution, which removed the functional group of the Asn-39 side chain, completely abolished the glycosylase activity ( Fig. 2B and Table  1). Conversion of the side chain from an amide group to a carboxyl group (N39D) rendered the enzyme active only on the G/I substrate (Fig. 2B and Table 1). Addition of a methylene group into the side chain (N39Q) allowed the enzyme to retain significant activity on the G/I substrate but negligible activity on the T/I substrate ( Fig. 2B and Table 1). These results indicate that Asn-39 serves as a catalytic center in the active site environment.
In Vivo Repair-To assess the role the new group of DNA glycosylase may play in vivo, we adopted the lacZ-based genetic assay (8). The E. coli CC106 strain contains a G to A transition mutation at codon 461 that allows for detection of A to G revertants, which has been used to provide evidence to implicate bacterial endonuclease V (encoded by nfi gene) as an enzyme for the repair of adenine deamination (9). It is known that DNA polymerases predominantly incorporate dCMP to pair with inosine in DNA templates (19). To assess the role that the hypoxanthine-DNA glycosylase activity may play in vivo, both a plasmid containing the WT Mba glycosylase gene and a plasmid containing the N39A mutant gene were transformed into an nfi-deficient CC106 strain. Although the N39A mutant failed to suppress A to G mutations in the CC106 nfi Ϫ cells under nitrosative stress, the WT Mba DNA glycosylase reduced the number of revertants by more than 5-fold (Fig. 2C). These results suggest that the hypoxanthine-DNA glycosylase activity in the Mba repair enzyme is responsible for the removal of hypoxanthine in vivo.

DISCUSSION
UDG-less Enzymes in UDG Superfamily-Since the discovery of the first uracil-DNA glycosylase in E. coli in 1974, all the uracil-DNA glycosylases studied possess UDG activity. Although repair activities on other deaminated bases have been found in different families in the UDG superfamily, UDG activity is always present. This work for the first time reports a class of UDG enzymes without UDG activity. Instead, they exhibit significant repair activity on hypoxanthine, the deamination product of adenine. Data from the in vivo reversion assay are consistent with the notion that this class of enzymes acts as a hypoxanthine-DNA glycosylase (Fig. 2C). These observations suggest that different UDG families have evolved to possess distinct repair specificities during evolution. Similar to the family 1 UNG enzymes, the repair activity of this class of enzymes  The reactions were performed as described under "Experimental Procedures" with 100 nM Mba DNA glycosylase and 10 nM substrate except that samples were withdrawn at 10-min intervals over 60 min. Data are an average of at least two independent experiments. ss, single-stranded. appears to be limited to hypoxanthine, indicating that it may have a narrow specificity. The HDG activity follows the order of G/I Ͼ T/I Ͼ A/I Ͼ C/I, which is consistent with the stability of inosine-containing base pairs (20 -22). Therefore, the tendency of spontaneous base flipping appears to play an important role in determining the repair activity of this class of enzymes. Hypoxanthine differs from uracil in that it lacks a C 2 -keto and has an imidazole ring. It remains to be answered how this class of enzyme specifically recognizes hypoxanthine but not uracil. More structural studies are needed to address this question.

Bottom strand
Unique Catalytic Center-The catalytic mechanisms in families 1 and 2 have been extensively studied. In family 1 UNG enzymes, a completely conserved Asp residue (Asp-145 in human UNG) rotates 120°once bound to a uracil-containing DNA and acts as a general base to activate a bound water molecule (23)(24)(25). In family 2 enzymes, the corresponding position is occupied by an Asn residue, which is proposed to activate a bound water molecule through side chain and main chain interactions (17,26). In the Mba enzyme, the equivalent position is occupied by a Leu residue (Fig. 1A). In the modeled structure, Leu-22 is not positioned toward the scissile bond ( Fig. 2A). Among residues in the vicinity of the scissile bond, Asn-39 emerged as the key catalytic residue that may perform a catalytic function similar to Asp in family 1 and to Asn in family 2. The evidence supporting this notion includes the following. 1) Asn-39 is completely conserved among the new class of enzymes. 2) Asn-39 is better positioned in the modeled structure to activate a water molecule to potentiate an in-line attack on the glycosidic bond.
3) The N39A mutant completely loses its catalytic activity.
New Family in UDG Superfamily-Based on the unprecedented lack of uracil-DNA glycosylase activity, the presence of HDG activity, and the unique location of the catalytic center, we performed a phylogenetic analysis within the UDG superfamily. As shown in Fig. 3, this new class of enzymes emerged as a distinct group within the superfamily. Therefore, we propose this class of enzymes as family 6 in the UDG superfamily and designate it as the HDG family. The information provided for the family 6 enzymes could help us understand the multiplicity of UDG superfamily enzymes in a variety of organisms. For example, although a family 4 enzyme in M. barkeri may carry out the repair function for cytosine deamination damage by acting as a UDG, the family 6 enzyme described here could perform the necessary repair for adenine deamination by acting as an HDG. In addition to the surprising lack of uracil DNA repair function, the use of a different catalytic center is particularly interesting as it exemplifies the ability of enzymes to adapt to a different catalytic environment to perform a catalytic function. Therefore, this study demonstrates the structural and catalytic flexibility that underlies the functional diversity in enzyme evolution.