Biochemical and Phylogenetic Characterization of the dUTPase from the Archaeal Virus SIRV*

The derived amino acid sequence from a 474-base pair open reading frame in the genome of the Sulfolobus islandicus rod-shaped virus SIRV shows striking similarity to bacterial dCTP deaminases and to dUTPases from eukaryotes, bacteria, Poxviridae, and Retroviridae. The putative gene was expressed inEscherichia coli, and dUTPase activity of the recombinant enzyme was demonstrated by hydrolysis of dUTP to dUMP. Deamination of dCTP by the enzyme was not detected. Phylogenetic analysis based on amino acid sequences of the characterized enzyme and its homologues showed that the dUTPase-encoding dut genes and the dCTP deaminase-encoding dcd genes constitute a paralogous gene family. This report is the first identification and functional characterization of an archaeal dUTPase and the first phylogeny derived for the dcd-dut gene family.

Little is known about the biosynthesis of nucleic acid precursors in archaea. It has been demonstrated that the extremely thermophilic archaeon Sulfolobus acidocaldarius is unable to utilize exogenous thymidine for biosynthesis of nucleic acids, presumably due to a lack of thymidine kinase (1). Thus, the de novo pathway must be the only supplier of thymidine nucleotides in Sulfolobus. Generally, the immediate thymidine nucleotide precursor in the endogenous de novo synthesis is dUMP. Three pathways for dUMP synthesis are known: (i) deamination of dCTP to dUTP by dCTP deaminase (2Ј-deoxycytidine-5Ј-triphosphate aminohydrolase, EC 3.5.4.13) and successive hydrolysis of dUTP to dUMP by dUTPase (2Ј-deoxyuridine 5Ј-triphosphatase, EC 3.6.1.23); (ii) deamination of dCMP to dUMP by dCMP deaminase (2Ј-deoxycytidine-monophosphate aminohydrolase, EC 3.5.4.12); and (iii) reduction of UDP or UTP to dUDP or dUTP, respectively, by ribonucleotide reductases followed by hydrolysis of dUTP to dUMP by dUTPase. The conversion of dUDP to dUMP is most probably done via phosphorylation to dUTP. In bacteria, the latter pathway contributes much less to dUMP synthesis than the pathways through deamination of deoxycytidine nucleotides (2). In higher organisms and in certain bacteria as well as in T-even phage-infected Escherichia coli cells, the deamination occurs at the monophosphate level, whereas in most enterobacteria, it occurs at the triphosphate level (3). Only one enzyme involved in the de novo pathway of thymidylate biosynthesis has recently been detected in Sulfolobus cells, thymidylate synthetase (4), the ubiquitous enzyme that catalyzes the conversion of dUMP to dTMP.
In the course of sequencing the genome of the Sulfolobus islandicus rod-shaped virus SIRV (5), we found an open reading frame (ORF) 1 that encodes a putative protein with considerable similarity to both bacterial dCTP deaminases and dUTPases from eukaryotes, bacteria, Poxviridae, and Retroviridae. In the latter case, the homologous region is part of the gag-pol polyproteins (6 -10). This dual similarity led us to functionally characterize the novel SIRV gene, which was accomplished by expression of the encoded protein in E. coli, with subsequent enzymatic tests. It also inspired a phylogenetic analysis of the dcd-dut gene family. Furthermore, the recent discovery of two members of this gene family in the whole genome sequence of Methanococcus jannaschii (11) made it obvious that a biochemical characterization of at least one archaeal member of this gene family is crucial for functional discrimination of these genes.

MATERIALS AND METHODS
Strains and Plasmids-SIRV was grown in cells of S. islandicus isolate REN2H1 (5) and purified by polyethyleneimine precipitation followed by cesium chloride gradient centrifugation. The pGEX-2T expression vector, containing the Schistosoma japonicum glutathione Stransferase gene under the control of the tac promoter (12), was purchased from Pharmacia Biotech Inc.
DNA Purification Procedures and Oligonucleotides-Virus DNA was prepared as described earlier (5). Plasmids were prepared with a plasmid kit from QIAGEN Inc. Oligonucleotides were synthesized on an Applied Biosystems Model 380B DNA synthesizer according to the manufacturer's instructions.
DNA Sequencing-Double-stranded DNA was sequenced using the U. S. Biochemical Corp. Sequenase 2.0 kit following the manufacturer's instructions. Oligonucleotide primers were complementary either to adjacent vector sequences or to portions of previously sequenced parts of the cloned DNA. To ensure accuracy, both DNA strands were sequenced.
Plasmid Construction and Transformation-The putative dut gene was amplified from SIRV DNA by the polymerase chain reaction (PCR) using Vent polymerase (New England Biolabs Inc.). Primer P1 (5Ј-TTATTGAATTCATCTTCTTTTGCTAATGTGAC-3Ј) corresponded to the 5Ј-end of the putative gene, and primer P1r (5Ј-ATCGAGGATC-CATGATTCTTTCAGATAG-3Ј) corresponded to the 3Ј-end of the coding region. To facilitate the subsequent cloning steps, a BamHI restriction site was introduced in the former primer, and an EcoRI site was introduced in the latter primer. PCR was performed in a GTC-2 genetic thermal cycler (Precision Scientific) under the following conditions: 3 min of initial denaturation at 94°C followed by 33 cycles of 1 min at 94°C, 1 min at 50°C, and 1 min at 72°C each. The amplified DNA * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  fragments were purified using QIAquick Spin PCR columns (QIAGEN Inc.) and subsequently cleaved with BamHI and EcoRI restriction endonucleases. The digested BamHI-EcoRI fragments were ligated into BamHI-EcoRI-cut pGEX-2T plasmid. The ligation mixture was incubated for 45 min at 16°C using the Pharmacia Ready-To-Go™ T4 ligase kit. E. coli JM83 competent cells were transformed with the recombinant vector as described by Smith and Johnson (12). The presence of the cloned insert in transformants was checked by plasmid preparation and subsequent digestion with appropriate restriction enzymes. To ensure that no mutations had been introduced by the PCR amplification, the entire insert was sequenced with Pharmacia primers complementary to pGEX-2T sequences adjacent to the cloning sites.
Protein Expression and Purification-An E. coli JM83 recombinant clone containing pGEX-2T with the correct insert was grown at 37°C in 500 ml of Luria-Bertani medium containing 100 g of ampicillin/ml until the A 600 reached 0.8. Cells were collected 3.5 h after induction with 0.5 mM isopropyl-␤-D-thiogalactopyranoside and subsequently resuspended in 10 ml of phosphate-buffered saline (137 mM NaCl, 2.7 mM KCl, 4.3 mM Na 2 HPO 4 ⅐7H 2 O, and 1.4 mM KH 2 PO 4 ). The resuspended cells were lysed with 1 mg/ml lysozyme; DNA was removed by digestion with 100 g/ml DNase I, and cell debris was removed by a 20-min centrifugation at 20,000 ϫ g. The supernatant was mixed with 1 ml of a 50% suspension of glutathione-agarose beads (Sigma) in phosphatebuffered saline. The agarose beads with the bound enzyme were collected by centrifugation, washed five times with 3 ml of phosphatebuffered saline, and resuspended in 1 ml of phosphate-buffered saline. Five NIH units/ml thrombin (purchased from Sigma) was added to the suspension to release the expressed enzyme from the agarose beads. After gently mixing for 1 h at room temperature, the beads were collected by centrifugation. The expressed enzyme in the supernatant was concentrated in Centricon-10 Quick-Start tubes (Amicon, Inc.).
SDS-Polyacrylamide Gel Electrophoresis-Electrophoresis was performed on 0.7-mm gels as described by Schagger and Von Jagow (13) with 4.5% polyacrylamide in the stacking gel and 16% in the separating gel. Gels were stained with Coomassie Blue R-250.
dUTPase Assay-The standard assay was carried out by 1 h of incubation at 56°C in a 75-l volume containing 20 mM potassium phosphate buffer, pH 6.8, 1.35 mM dithiothreitol, 7 mM MgCl 2 , 0.15 mM [ 3 H]dUTP (specific activity of 1.65 Ci/mol), and the appropriate amount of the recombinant enzyme. Five-l samples of the reaction mixtures were applied to polyethyleneimine-cellulose plates (Merck). Prior to the application of the samples, 30 nmol each of dUTP, dUDP, and dUMP were applied to the start line of the chromatogram. The plates were developed in one dimension with 1 M formic acid and 0.5 M LiCl and, after drying, examined under UV light. The marker spots corresponding to dUTP, dUDP, and dUMP were cut out; nucleotides were eluted by washing excised sections with 2 M LiCl; and radiolabel was quantified by scintillation counting.
dCTP Deaminase Assay-dCTP deaminase activity was determined by a spectrophotometric method based on the difference between the molecular extinction coefficients of deoxycytidine and deoxyuridine (3). The assay was carried out at 56°C in time intervals between 10 and 120 min. The reaction volume was 400 l and contained 20 mM potassium phosphate buffer, pH 6.8, 1.35 mM dithiothreitol, a range of MgCl 2 concentrations (0 -25 mM), 1 mM dCTP, and different amounts of the recombinant enzyme (2.5-50 g).
Phylogenetic Analysis-Amino acid sequences were retrieved from public data bases. Multiple alignment of the sequences was performed with the help of the computer program CLUSTAL (14). Final adjustments considering obvious similarities not indicated by the alignment program were done manually after visual inspection. Phylogenies were inferred and analyzed for statistical confidence using distance, parsimony, and bootstrapping programs from the PHYLIP package (Version 3.572PC) (15) and PAUP (Version 3.1.1) (16).

RESULTS
The ORF analyzed in this report was found in a 2101-base pair-long EcoRI-ClaI restriction fragment of SIRV DNA. The nucleotide sequence of the ORF and its flanking regions is presented in Fig. 1. The GC content of the coding region is 27%, close to the average GC content of SIRV DNA (25%). 2 The putative translation start codon (ATG) is 1482 base pairs from the EcoRI cloning site, and the TTA stop codon is 145 base pairs from the ClaI cloning site. A typical archaeal "box A" promoter motif, TTAAA (17), is found 61-66 nucleotides upstream of the translation start codon. A typical Sulfolobus terminator motif,   (18), is located 11-17 nucleotides downstream of the translation termination codon (Fig. 1).
A polypeptide presumably encoded by this ORF has 158 amino acid residues and an inferred molecular mass of 16.2 kDa. When the PIR protein sequence data base (19) was searched with this amino acid sequence, the highest sequence similarities were found with hypothetical protein-3 from Desulfurolobus ambivalens (88%) (20); with dCTP deaminase from E. coli (at that time, the only known dCTP deaminase; 58%); and with dUTPases from retroviruses, lentiviruses, and poxviruses (49 -59%). Statistical tests for the significance of the similarities to these sequences resulted in scores significantly higher than the mean scores between random sequences.
To find out the enzymatic properties of the protein presumably encoded by the ORF, this region of the viral genome was amplified by PCR and inserted into pGEX-2T, an E. coli expression vector containing the glutathione S-transferase gene regulated by the tac promoter and followed immediately by a cloning site containing, at the 5Ј-end, a sequence encoding a thrombin cleavage site. The sequence of the cloned DNA fragment was shown to be identical to the original SIRV sequence. The recombinant protein expressed in E. coli was induced and had an apparent molecular mass of ϳ42 kDa (Fig. 2), close to that predicted by the fusion of the 26-kDa glutathione S-transferase and the ϳ16.2-kDa SIRV protein. The fusion protein was bound to immobilized glutathione, and after removal of the nonspecifically bound material, the expressed SIRV protein was released by thrombin cleavage. It had an apparent molecular mass of ϳ15 kDa, slightly smaller than the predicted mass. Analysis on a 16% SDS-polyacrylamide gel showed it to be 94% pure (Fig. 2).
The synthesized protein was tested for putative dUTPase activity by incubation with 3 H-labeled dUTP and subsequent chromatography of the reaction products on polyethyleneimine thin-layer plates. In spots corresponding to dUTP, dUDP, and dUMP, radioactivity was quantified by scintillation counting; the results are shown in Table I. The label was found in two species, one having the mobility of the input dUTP and the other having the mobility of dUMP. No significant radioactivity comigrated with dUDP. Thus, the expressed protein was capable of hydrolyzing dUTP to dUMP and therefore was a dUT-Pase, as suggested by the primary structure. Due to the strong sequence similarity, the expressed enzyme was also tested for dCTP deaminase activity. No change in the dCTP spectrum was observed as a result of incubation of dCTP with the expressed protein, as described under "Materials and Methods" (data not shown). Therefore, dCTP deaminase activity could be ruled out for the recombinant enzyme.
Sequence alignments are the basis for molecular phylogenetic analyses. Fig. 3 shows the amino acid sequence of the novel SIRV dUTPase aligned with all presently known archaeal, bacterial, and eukaryotic homologues and most viral homologues found in the public sequence data bases by BLAST search (34). Thirteen positions are characteristic for the three prokaryotic dCTP deaminases, and four positions are characteristic for the three putative archaeal dUTPases: SIRV, D. ambivalens (hypothetical protein-3), and M. jannaschii (gene MJ1102 product). Four amino acids are universally conserved throughout all sequences (aspartic acid 57, serine 147, aspartic acid 165, and glycine 170) (Fig. 3). All four belong to the previously described conserved sequence motifs of dUTPases (40): aspartic acid 57 belongs to motif 1, serine 147 to motif 2, and aspartic acid 165 and glycine 170 to motif 3. The latter two amino acids correspond in human dUTPase structure to residues in the ␤-hairpin, suggested to be responsible for binding the deoxypyrimidine portion of dUTP (41); aspartic acid 165 is believed to be important for discrimination between dUTP and UTP (42). Motif 4 described by McGeoch (40) is less conserved in the three putative archaeal dUTPases and in dCTP deaminases. Even less conserved in archaeal dUTPases is motif 5 (40), suggested to be important for the catalytic activity of dUTPases (43,44); moreover, in the enzyme from M. jannaschii, the region where motif 5 is found in other dUTPases is completely absent. Based on the alignment presented, phylogenetic distances determined by PAUP using the protpars similarity matrix show the novel sequence to be most closely related to the D. ambivalens hypothetical protein-3, at a distance of 0.37. The three dCTP deaminases are rated at an average distance of 1.01, eukaryotic dUTPases at 1.29, bacterial dUTPases at 1.34, and viral homologues at 1.35. SIRV dUTPase is also the closest relative to the protein encoded by M. jannaschii gene MJ1102, with a phylogenetic distance of 1.14, compared with 1.38 for dCTP deaminases and 1.53 for bacterial and eukaryotic dUTPases. Fig. 4 shows a circular phylogenetic diagram of the dUTPase-dCTP deaminase family. The branching pattern shown in this diagram was derived with a maximum parsimony method (using the protpars matrix in PAUP) and statistically tested by bootstrapping. Due to the short length of the molecules and the enormous phylogenetic depths of the compared samples, a complete resolution of the phylogeny could not be expected (see unresolved region in the eukaryote-Poxviridae section of the diagram). However, the differentiation between dCTP deaminases and all dUTPases, as well as the unity of the archaeal dUTPases, the common lineage of bacterial dUTPases, and the linkage between eukaryotic and Poxviridae dUTPases, was also confirmed by inferences based on a distance method (using the PHYLIP program FITCH). DISCUSSION The highest scoring match in an initial data base search with the SIRV gene product described herein was to putative protein-3 from D. ambivalens (20). The next best matches, respectively, were to two functionally different enzymes of the thymidine nucleotide biosynthesis pathway: dCTP deaminase and dUTPase. Although the BLAST comparison indicated a better match to dCTP deaminase from E. coli than to dUTPases, functional assays indicated that the protein encoded by the gene has dUTPase activity, but no dCTP deaminase activity. This underscores the risk in drawing conclusions about function roles for new genes and their products by sequence comparison alone, especially for gene families with very few biochemically characterized members. Aside from the products of two E. coli genes, dcd (21) and dut (23), and three viral dut genes (35)(36)(37)(38)(39), no product of any other prokaryotic or viral member of the dcd-dut gene family has been biochemically characterized.
Recently, hypothetical protein-3 from D. ambivalens was classified as a "probable dCTP deaminase" (25). Since the D. ambivalens homologue is the closest relative to the SIRV gene analyzed here, this enzyme should be reclassified as a probable dUTPase based on the functional and phylogenetic results described here.
Two members of the dcd-dut gene family have been identified in the whole genome sequence of M. jannaschii: MJ0430 and MJ1102 (11). Based on sequence comparison, both genes were classified as dCTP deaminases. From the biochemical and phylogenetic analyses shown here, however, it is clear that M. jannaschii probably does not possess two homologous dCTP deaminases, but one dUTPase (MJ1102) and one dCTP deaminase (MJ0430), paralogous members of the same dcd-dut gene family. The identification of both genes in the same genome also allows us to postulate that the de novo biosynthesis pathway for thymidine nucleotides in M. jannaschii includes deamination of dCTP to dUTP by dCTP deaminase and successive hydrolysis of dUTP to dUMP and inorganic pyrophosphate by dUTPase (pathway option i in the Introduction). Further studies of this pathway in distantly related archaea, e.g. Sulfolobus, will determine whether this observation holds true across Archaea.
It seems reasonable that SIRV encodes its own dUTPase: pool sizes of dTTP in host cells might be not sufficiently high to support rapid growth of a virus with a genomal GC content of only 25%. (The GC content of the chromosomal DNA of Sulfolobus solfataricus, a close relative of the SIRV host strain, is 38%.) It is also possible that dUTPase from SIRV is involved in keeping the intracellular dUTP concentration at a low level such that incorporation of dU into DNA is minimized, and thus, the DNA repair processes are not invoked (24).