Identification of the tRNA-Dihydrouridine Synthase Family*

5,6-Dihydrouridine (D) is a modified base found abun-dantly in the D-loops of tRNA from Archaea, Bacteria, and Eukarya. D is thought to be formed post-transcrip-tionally by the reduction of uridines in tRNA tran-scripts. Despite its abundance, no enzymes that catalyze D-formation have been identified. Using comparative genomics and computational methods we have identified members of the cluster of orthologous genes, COG0042, as putative dihydrouridine synthase encoding genes. Escherichia coli contains three COG0042 family members ( yjbN , yhdG , and yohI ). Strains were created where one, two, or all three of the COG0042 genes were deleted. Purified tRNA samples were investigated from the three single and the three double knockout strains, as well as from the triple deletion strain. The results showed that the COG0042 gene family is responsible for tRNA-dihydrouridine synthase activity in E. coli . They also suggest that the COG0042-encoded family members act site-specifically on the tRNA D-loop and contain non-redundant catalytic functions in vivo . (D) abundant modified base in and tRNA This non-aromatic base is found almost at conserved positions in the D-loop


5,6-Dihydrouridine (D)
is the most abundant modified base in prokaryote and eukaryote tRNA (1). This non-aromatic base is found almost exclusively at conserved positions in the D-loop ( Fig. 1A) (1). The tRNA of Escherichia coli contains up to four D residues with the vast majority containing at least one D. (Only tRNA Tyr and tRNA Glu do not contain D.) Additionally, D has been found at position 2449 of the E. coli 23 S rRNA (2). Despite the widespread occurrence of this non-aromatic base, little is known about its biochemical roles. Evidence that dihydrouridine may destabilize the structure and thus enhance conformational flexibility of tRNA was presented by McCloskey and coworkers (3). They showed that a short D-containing oligonucleotide favors the C2Ј-endoribose conformation as com-pared with the equivalent U-containing oligonucleotide that favors the C3Ј-endo conformation. The C3Ј-endo conformation is necessary for base-stacked RNA. The same group argued that thermophilic organisms generally possess lower levels of D in their tRNA because of the lesser need for inherent tRNA conformational flexibility at high growth temperatures (4).
Despite the abundance of D in tRNA of prokaryotes and eukaryotes, the genes encoding the enzymes for dihydrouridine synthesis have not been identified. Aware that D occurs in specific positions in the D-loop and that different tRNA from the same organism contain varying amounts of D, we hypothesized that prokaryote and eukaryote genomes encoded more than one enzyme and that their substrate specificities were distinct. However, without genetic selections or screens that utilize in vivo phenotypes, identifying the gene(s) responsible for dihydrouridine synthesis has been difficult. To address this problem in a new way, we took advantage of the growing number of complete genomic data bases. To reduce the number of possible candidates, we used computational algorithms to identify genes that are specific to organisms that synthesize dihydrouridine. This approach ultimately narrowed down the candidates to three E. coli genes encoding putative DUS activity. Their associated biochemical phenotypes were investigated.

Strains and Growth Conditions
Bacteria were routinely grown in mineral standard (MS) medium (5) containing 2 g of glucose/liter or in LB medium (6). Growth media were solidified with 15 g/liter agar for the preparation of plates. Transformations and P1 transductions were performed following standard procedures (6,7). See Table I for complete strain list.

Plasmid and Strain Construction
Construction of pRJ1774 -The entire yhdG-fis operon extending from the upstream HindIII site and the downstream BglII site was cloned between the HindIII and BamH1 sites of pACYC184 (8) to generate pRJ1772. Plasmid pRJ1774 (p15A ori Cm r yhdG ϩ ) was derived from pRJ1772 by deletion of the DNA segment between the internal BstE11 sites (to inactivate the fis gene).
Construction of yhdG::Km r -Plasmid pRJ753, which contains most of the yhdG-fis operon cloned into pUC9 (9), was digested with NruI, which uniquely cleaves within the yhdG gene. The 1.3-kb HindIII-SmaI fragment encoding Km r and its promoter from Tn5 was treated with T4 DNA polymerase and dNTP to generate flush ends and ligated with NruI-cut pRJ753 to give pRJ770. Plasmid pRJ770 has Km r inserted in the opposite orientation as yhdG at codon 208 (within the 321-amino acid coding region). The insertion was recombined from pRJ770 into the E. coli chromosome as described previously (9,10) to give RJ1740.
Construction of yjbN::Cm r -The gene yjbN was amplified by PCR from MG1655 (11) genomic DNA using the oligonucleotides ATAACA-TAACCCCATGATATATCG and ACGGTGCGCAGGGCGTGGTGAAT-TTG. The PCR fragment was inserted into pPCR4-topo (Invitrogen) using the manufacturer's recommended conditions to give pVDC596. The 1.2-kb AccI fragment encoding Cm r from ploxPCm (12) was then * This work was supported by Grant GM23562 from the National Institutes of Health and by a fellowship from the National Foundation for Cancer Research. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
§ cloned into the AccI site within yjbN in pVDC596 to give pVDC202, in which Cm r is in the same orientation as yjbN.
Construction of yohI::spec R -The gene yohI was amplified from MG1655 (11) genomic DNA using the oligonucleotides CAGGCATAA-ATCAGGACGAAAG and GACGCCACCAAACAACCGTGCG and cloned into pPCR4-topo to give pVDC599. The Spec r SmaI fragment from Omegon-Spec r (13) was cloned into HincII-digested pVDC599 to give pVDC201.
A 3-kb PCR fragment from pVDC201 and a 1.8-kb fragment from pVDC202 were amplified with Taq polymerase using the universal forward and reverse primers. The fragments were gel purified and dialyzed in water before being transformed into electrocompetent KM44 and JC8679 (14,15). (In these backgrounds, recBC gene products are overexpressed, allowing efficient recombination of linear PCR fragments.) For the construction of yjbN::Cm r , attempts were performed both in KM44 and JC8679. Cm r clones took 8 to 10 days to appear on selection plates. Selected clones were tested by PCR for insertion of the correct locus. Two independent clones were kept for further studies. The yhdG::Km r and yohI::Spec r markers were transferred into strain MG1655 (11) by P1 transduction. Iterative P1 transductions were performed to generate double and triple deletion DUS strains.
Expression and Purification of Total tRNA E. coli strains were grown in 2 liters of LB supplemented with the appropriate antibiotics and harvested during late-log phase growth (OD 600 ϳ0.6 -0.8). Frozen cell pellets were resuspended in 20 ml of resuspension buffer (0.3 M NaOAc, 10 mM EDTA, pH 5.2) and extracted with buffer-saturated phenol (pH 4.3, Sigma). The aqueous layer was collected, and the RNA was precipitated with ethanol. The tRNA was purified on a Nucleobond AX-500 column (CLONTECH, Palo Alto, CA) according to the manufacturer's protocol.

Assay for tRNA Dihydrouridine Content
Colorimetric measurement of tRNA D content was performed by the method of Jacobson and Hedgcoth (17). N-phenyl-p-phenylenediamine and 2,3-butadione monoxime solutions were prepared as described by Hunninghake and Grisolia (18), except that the free base N-phenyl-pphenylenediamine (Aldrich) was used instead of the hydrochloride salt.

RESULTS
Comparative Genomic Identification and Analysis of the 5,6-Dihydrouridine Synthase Gene Family-We reasoned that an activity that catalyzes a 2 mass unit change (on a ϳ25-kDa tRNA) from unknown hydride and proton donors would be difficult to purify from cellular lysates. Alternatively, we used comparative genomics to identify candidates for the missing DUS activity (Fig. 1B). Our first assumption was that, because the D modification is ubiquitous, a DUS-encoding gene family should be present in the list of COGs generated by Tatusov et al. (19). At the time of the analysis the COG data base contained 3307 COG families.
The first criterion used the predicted phylogenetic distribution of this family. The D modification has not been detected in tRNA of the hyperthermophilic archaea Pyrococcus furiosus (20,21). We investigated COGs that were absent in P. furiosus and present in Methanobacterium thermoautotrophicum, Bacillus subtilis, Saccharomyces cerevisiae, and E. coli. Biochem-   (1,22) showed D was present in tRNA from the latter four organisms. Eighty-six COGs were retrieved, of which six represented uncharacterized or poorly characterized gene families. Using the ERGO website, 2 we looked at genes that were physically linked to genes of the six identified COGs. Only one, COG0042, had a member (in Methanococcus jannaschii, MJ0144) that was co-clustering with tRNA modification genes such as glutamyl-tRNA reductase, nucleotidyltransferase, and tRNA pseudouridine 55 synthase.
COG0042 encodes for proteins of unknown function annotated as "predicted TIM-barrel enzymes, possibly dehydrogenases, nifR3 family" (19). This family had members in all 2 wit.integratedgenomics.com/igwit. sequenced D-containing organisms. Some organisms contained only one member of this family (MJ0144 in M. jannaschii or aq1598 in Aquifex aeolicus), some two, some three (yohI, yjbN, and yhdG in E. coli), and some even four paralogs (S. cerevisiae).
Finally, Psi-blast analysis (23) revealed similarities of the proteins of the COG0042 family to proteins of the DHODH and DHPDH families of enzymes (Fig. 2). These enzymes catalyze the oxidation of dihydroorotate to orotate in pyrimidine biosynthesis (24) and the NADPH-dependent reduction of uracil to 5,6-dihydrouracil (25), respectively. This latter reaction is directly analogous to the reaction catalyzed by a hypothetical DUS. However, DHPDH uses uracil, whereas a DUS enzyme would presumably require an intact tRNA substrate ( Fig. 2A).
Interestingly, we found no significant homology between COG0042 and known RNA-binding proteins or conserved RNAbinding domains. However, in higher eukaryotes some COG0042-containing genes contain a double-stranded RNAbinding domain (DSRM) (26) on the C-terminal side of the COG0042 region (27).
Three-dimensional structural information is available for both a DHODH, PyrDB (28), and a mammalian DHPDH (29) that have regions of sequence similarity to COG0042. The PyrDB gene product from Lactococcus lactis is a 311-amino acid polypeptide that aligns with the 321-amino acid E. coli YhdG gene product (one of three E. coli COG0042 proteins) over an ϳ200-amino acid region (Fig. 2B). The crystal structure of the PyrDB gene product showed that it adopts an ␣/␤ barrel  fold with a bound FMN (28). DHPDH is a much larger and more complex protein than either PyrDB or YhdG (29). Domain IV of DHPDH aligns with the PyrDB and YhdG proteins and comprises part of the pyrimidine-and FMN-binding regions of DHPDH. Domain IV also adopts an ␣/␤ barrel fold. These findings suggest that the COG0042-encoded proteins comprise a family of FMN-binding ␣/␤ barrels. However, both the PyrDB protein and domain IV of DHPDH require association with other proteins (non-covalently for the PyrDB protein, covalently for DHPDH domain IV) for full catalytic activity (28,29). Thus, the COG0042 family members may also be part of a larger DUS complex. Reduced Dihydrouridine Content in E. coli tRNA from Single COG0042 Deletion Strains-To test the hypothesis that COG0042 encodes a ubiquitous family of dihydrouridine synthase enzymes, we generated deletion strains for each of the COG0042 family members in E. coli, ⌬yhdG, ⌬yjbN, and ⌬yohI. (See "Experimental Procedures.") The yhdG and yohI deletions were readily obtained and could be transduced into a standard genetic background (MG1655) (11). However, chloramphenicolresistant ⌬yjbN::Cm r mutants emerged only after long incubation times (8 -10 days) in a recombination-enhanced strain, JC8679 (15). The yjbN::Cm r cassette could not be transferred into a wild-type background, suggesting that some mutation event may have been necessary for the isolation of a viable ⌬yjbN strain. 3 We are further investigating this possibility. To ensure that strain background did not affect dihydrouridine content in tRNA, the tRNA from all deletion strains were compared with those from their corresponding parent strains, MG1655 for ⌬yhdG and ⌬yohI and JC8679 for ⌬yjbN.
We purified total tRNA from the two parent strains and the three single deletion strains and analyzed the samples for dihydrouridine content. To quantify dihydrouridine content we used two established procedures that each utilize the chemical instability of dihydrouridine under strongly basic conditions (17,30). Treatment of RNA with 0.1 M KOH (to which the standard RNA bases are inert) leads to hydrolysis of the nonaromatic dihydrouracil ring, yielding N-ribosyl-3-ureidopropionic acid. The extent of this reaction can be measured through colorimetric detection of the ureido group of N-ribosyl-3-ureidopropionic acid (17) or by monitoring the loss of absorbance at 235 nm (30). These methods yielded identical results (data not shown). Therefore, only the data from the colorimetric assay are presented here.
All three of the single COG0042 deletion strains, ⌬yhdG, ⌬yohI, and ⌬yjbN, contain reduced levels of D when compared with their corresponding parent strains (Fig. 3). The greatest defect in D production was found in the ⌬yjbN, which contained about 50% of the D in a wild-type strain (Fig. 3A). The tRNA from ⌬yhdG had a somewhat smaller defect in D production, and it contained roughly 65% of the D found in wild-type tRNA (Fig. 3B). In terms of mole percentage, YohI appears to be the least important of the three DUS enzymes, in that deletion of yohI gave only a small yet reproducible reduction in tRNA D content (Fig. 3C). These findings strongly support the comparative genomic analysis that led to the identification of COG0042 as a candidate for DUS activity.
In the E. coli genome, the yhdG gene lies directly upstream of fis, a broad regulator of gene transcription (31). (yjbN and yohI are not located in operons.) Therefore, we wanted to ensure that the observed decrease in D in the ⌬yhdG strain was not due to a nonspecific disruption of the E. coli transcriptional machinery. We introduced a plasmid, pRJ1774, containing the yhdG gene into the previously described ⌬yhdG strain. We then purified tRNA from the strain expressing plasmid-borne YhdG and compared its D content to both wild-type tRNA and tRNA from the ⌬yhdG strain (Fig. 3D). We found that introduction of yhdG significantly increased the level of D in tRNA from the ⌬yhdG strain. The tRNA harvested from the pRJ1774-transformed cells contained ϳ93% of the D content of a wild-type strain, whereas the ⌬yhdG tRNA contained 65% when compared with wild-type. This genetic and biochemical complementation demonstrates that the low D content of the yhdG deletion strain is a direct result of the disruption of the gene and not an "off-target" artifact.
Additivity of Dihydrouridine Defects in E. coli tRNA from Double COG0042 Deletion Strains-We next sought to investigate the effects of deleting multiple COG0042 genes in the same E. coli strain. To accomplish this, we generated all three combinations of double COG0042 deletions, ⌬yhdG/⌬yohI, ⌬yjbN/⌬yhdG, and ⌬yjbN/⌬yohI. As previously noted, the yjbN::Cm r marker could not be readily transduced into different genetic backgrounds. Therefore, the ⌬yjbN/⌬yhdG and ⌬yjbN/⌬yohI strains were generated by transducing the yhdG::Km r and yohI::Spec r markers into the previously isolated ⌬yjbN strain. We next purified tRNA from all three of these strains and analyzed the samples for D content (Fig. 4). The D deficiencies from these double deletion strains were close to the sum of the defects for each single deletion strain. For example, the ⌬yhdG/⌬yohI tRNA (Fig. 4A) possessed D levels that were slightly below that of the ⌬yhdG strains (Fig. 3B). Likewise, both the ⌬yjbN/⌬yhdG (Fig. 4B) and ⌬yjbN/⌬yohI strains (Fig. 4C) have D levels that are reduced to a greater extent than that of the ⌬yjbN strain (Fig. 3A). The additivity of the these defects is striking and suggests that the activity of each enzyme is independent. The additivity also suggests that the different enzymes have distinct substrate specificities, either for particular tRNA molecules or for particular positions on the D-loop of varying tRNA species.
Analysis of Overexpressed tRNA 2 fMet from Single COG0042 Knockout Strains-To investigate further, we purified tRNA 2 fMet from wild-type E. coli and from the three single DUS deletion strains, ⌬yjbN, ⌬yhdG, and ⌬yohI. This tRNA contains only one D (Fig. 5A) (1) and has been previously overexpressed from a pUC-based plasmid, pUCtRNFM (16,32). If the DUS activities of E. coli are redundant, then we would expect that tRNA 2 fMet from a deletion strain would have the same reduced D content as total tRNA. However, if the three DUS enzymes are position-specific, purified tRNA 2 fMet should have wild-type levels of D when purified from certain deletion strains and no D when purified from others.
We transformed wild-type, ⌬yjbN, ⌬yhdG, and ⌬yohI E. coli 3 V. de Crécy-Lagard and P. Schimmel, unpublished results. with pUCtRNFM and purified the overexpressed tRNA 2 fMet . Sufficient samples of pure tRNA 2 fMet (Fig. 5B) were obtained from all deletion strains in order to analyze for D content. We found that tRNA 2 fMet from wild-type, ⌬yhdG, and ⌬yohI strains contained indistinguishable levels of D (Fig. 5C). These tRNA 2 fMet samples also contained ϳ40% of the D content of total wild-type tRNA, in good agreement with published reports that E. coli contains roughly 2.5 D residues per tRNA molecule (data not shown) (17). However, tRNA 2 fMet that was purified from a ⌬yjbN strain contained no detectable D (Fig. 5C). These data demonstrate that YjbN is solely responsible for D formation at position 21 of tRNA 2 fMet . A Completely Dihydrouridine-deficient E. coli Strain-To investigate the possibility of obtaining an E. coli strain that contains no D in its tRNA, we transduced the yohI::Spec r marker into the ⌬yjbN/⌬yhdG background. Accordingly, we obtained a strain in which all three E. coli COG0042 genes, ⌬yjbN/⌬yhdG/⌬yohI, were deleted. Purified tRNA from this strain contains no detectable D (Fig. 6). Thus, the COG0042 genes are collectively responsible for all detectable D formation in E. coli tRNA. This result does not exclude the possibility that these DUS enzymes partner with other proteins to complete their redox cycle, but it is clear that there is no significant alternative pathway for D synthesis. Thus the triple deletion strain is an organism in which the tRNA normally contains D but has been engineered to lack D. We have not yet determined if the single D in the 23 S rRNA is present. The ⌬yjbN/⌬yhdG/ ⌬yohI strain does not possess any readily apparent growth defects, allowing for the possibility that the evolutionary advantages of DUS activity are subtle. DISCUSSION Although the widespread presence of D in tRNA has been known for decades (33), the activity of a DUS enzyme has not been previously identified. Here we have shown that a widely distributed but previously uncharacterized gene family, COG0042, is responsible for the formation of D in the D-loops of tRNA. These genes are predicted to encode a family of FMNbinding ␣/␤ barrels based on their homology to DHODH and DHPDH. As stated earlier, identification of the gene for DUS by conventional methods is inherently difficult. By using comparative genomics to guide genetic and biochemical experiments, we were able to overcome technical pitfalls. Approaches similar to those used here will be useful for assigning functions to other gene families of unknown function. This is particularly true for genes that control post-transcriptional or post-translational modifications, because the products of these reactions are usually characterized prior to the identification of the modification enzymes.
Based on the chemical similarities of the catalyzed reactions, it is perhaps not surprising that comparative genomics yielded a candidate gene family that bears structural resemblance to DHODH and DHPDH. However, COG0042 contains no known RNA-binding domains and is not predicted to interact with RNA in annotated genomic data bases. This raises the question whether COG0042 gene products specifically recognize the Dloop of tRNA or whether they must associate with an RNAbinding protein. Although few proteins are known to bind to the D-loop of tRNA (34 -36), preliminary studies suggest that purified E. coli YhdG binds the D-loop in vitro. 4 Finally, all of the data presented in this study center around the DUS from E. coli. The available genomic and biochemical information shows that the patterns and degrees of D modification vary substantially from organism to organism. For ex-ample, tRNA Trp in E. coli contains three Ds whereas tRNA Trp from B. subtilis contains only one. Likewise, D-containing organisms can contain one (A. aeolicus), two (B. subtilis), three (E. coli), or even four (S. cerevisiae) DUS. Therefore, each organism may contain a unique pattern of positional specificity for its set of DUS.