Crystal structure of TruD, a novel pseudouridine synthase with a new protein fold.

TruD, a recently discovered novel pseudouridine synthase in Escherichia coli, is responsible for modifying uridine13 in tRNA(Glu) to pseudouridine. It has little sequence homology with the other 10 pseudouridine synthases in E. coli which themselves have been grouped into four related protein families. Crystal structure determination of TruD revealed a two domain structure consisting of a catalytic domain that differs in sequence but is structurally very similar to the catalytic domain of other pseudouridine synthases and a second large domain (149 amino acids, 43% of total) with a novel alpha/beta fold that up to now has not been found in any other protein.

TruD, a recently discovered novel pseudouridine synthase in Escherichia coli, is responsible for modifying uridine13 in tRNA Glu to pseudouridine. It has little sequence homology with the other 10 pseudouridine synthases in E. coli which themselves have been grouped into four related protein families. Crystal structure determination of TruD revealed a two domain structure consisting of a catalytic domain that differs in sequence but is structurally very similar to the catalytic domain of other pseudouridine synthases and a second large domain (149 amino acids, 43% of total) with a novel ␣/␤ fold that up to now has not been found in any other protein.
The most abundant modification seen in structured RNAs (transfer, ribosomal and splicing RNAs) is the isomerization of uridine (U) to pseudouridine (5-ribosyluracil, ⌿, 1 Fig. 1A). ⌿ is made by a set of enzymes called ⌿ synthases, which select specific U residues in a polynucleotide chain for isomerization to ⌿. ⌿ synthases are ubiquitous as putative synthase genes have been found in all genomes so far sequenced (1). In Escherichia coli, the ⌿ formation activity of 10 putative gene products has been experimentally demonstrated (see references cited in Table II of Ref. 2). These ⌿ synthases are related by a set of amino acid sequence motifs and can be grouped into four families based on the degree of amino acid sequence homology (3)(4)(5). A fifth family has recently been described whose members have no apparent sequence homology with the others. It was discovered by identifying the E. coli member of this family as an enzyme catalyzing the formation of ⌿13 in tRNA Glu (Fig.  1B) and was named TruD (6). Fifty-eight homologs of TruD, distributed among the genomes of Eubacteria, Archaea, and Eukarya were identified (6). The function of the homologs is not known except for Pus7p (YOR243c) of Saccharomyces cerevisiae, which makes ⌿13 in tRNA as well as ⌿35 in U2 snRNA (7). Because all of the organisms known to have ⌿13 in their tRNAs also have a TruD homolog, it is reasonable to infer that TruD homologs in those organisms with tRNA ⌿13 are the responsible synthases. This need not, however, rule out synthase specificity for other sites in homologs which lack tRNA ⌿13.
Crystal structures of a member from each of the original four families of ⌿ synthases have revealed a striking correspondence of both overall structure and of certain conserved amino acids in the catalytic module, including an aspartate residue at the active site (8 -13). This raises the question of how the catalytic center of TruD is organized, since there is virtually no amino acid sequence homology, despite the presence of an essential aspartate residue (6). In this work, we report the crystal structure of TruD to a resolution of 2.2 angstroms. TruD folds into a V-shaped molecule with two distinct modules. One (57% of the total amino acids) closely resembles the catalytic module of the previously determined ⌿ synthases despite the lack of amino acid sequence similarity. The other module (43%) lacks any similarity in either sequence or structure to known protein domains but is conserved in all TruD homologs. This domain, which we term the TRUD domain, appears to be a novel RNA binding domain.

EXPERIMENTAL PROCEDURES
The truD gene was cloned into the NdeI and BamHI sites of the pET28a vector (Novagen Inc.) as described previously (6). Native and selenomethionine (SeMet)-labeled TruD protein were expressed and purified essentially as described for RluD (14). After purification, TruD was concentrated to 20 mg ml Ϫ1 by Amicon Ultra centrifugation (Millipore Inc.). Diffraction quality crystals of TruD were grown at room temperature by the hanging drop vapor diffusion method by mixing equal volumes of protein (from 5 to 10 mg ml Ϫ1 in 20 mM HEPES, pH 8.0, 250 mM NaCl, 2 mM EDTA, 2 mM dithiothreitol, 25% (v/v) ethylene glycol) and a reservoir solution containing 0.1 M MES, pH 6.0, 12-14% (w/v) polyethylene glycol 8000, and 25% (v/v) ethylene glycol. The crystals were tetragonal (space group P4 3 ) with a single molecule of TruD in the asymmetric unit. X-ray diffraction data were collected on a single flash-frozen SeMet TruD crystal and a single flash-frozen native TruD crystal at the X12-C beamline, National Synchrotron Light Source, Brookhaven National Laboratory. Details of data collection and refinement are presented in Table I (in Supplementary Materials).
Diffraction data were processed using HKL (15), and initial phases were determined for a SeMet derivative by single wavelength anomalous dispersion (SAD) in SOLVE (16). SOLVE located three of the five expected selenium sites using SAD data to 2.2 Å and generated an initial electron density map with an overall figure of merit of 0.29. This map was improved to a figure of merit of 0.61 using density modification and solvent flattening in RESOLVE (17). The automated model building function of RESOLVE placed ϳ65% of the TruD model into the electron density, and the model was extended by several rounds of model building and refinement in O (18) and CNS (19), respectively. Using this model, the structure of native TruD was solved by molecular replacement in CNS followed by subsequent rounds of model building and refinement as above. The N-terminal polyhistidine tag and nine amino acids at the C terminus were not observed in the electron density   1 The abbreviations used are: ⌿, pseudouridine; SeMet, selenomethionine; SAD, single wavelength anomalous dispersion; TGT, tRNA guanine transglycosylase; r.m.s.d., root mean squared deviation. and could not be modeled. The final model (R ϭ 0.21, R free ϭ 0.25) contains residues 1-340 and 240 water molecules in the asymmetric unit, and 88.4% of the residues are in the most favored regions of the Ramachandran plot with no residues in the disallowed region. Model quality was monitored using PROCHECK (20) and the MOLPROBITY web tool (21).

RESULTS AND DISCUSSION
TruD folds into a V-shaped molecule with two domains: a catalytic domain and a TRUD domain linked together by three extended loops (Fig. 2, A and B). The catalytic domain adopts a mixed ␣/␤ fold characteristic of ⌿ synthases from the four other families. The TRUD domain (residues 155-303) forms a compact fold that is tilted away from the catalytic domain to form a deep cleft in TruD which is lined with basic residues from each domain (Fig. 2C). Several conserved basic residues extend down and out from this cleft to provide an elongated positively charged surface that is similar in length to the longer anticodon arm of a typical L-shaped tRNA. The deep end of this cleft, which likely defines the tRNA binding site for TruD, leads to the putative catalytic aspartate (Asp 80 ; Ref. 6).
Despite the extensive topological connections between the catalytic and TRUD domains, the interface between them is not very extensive and covers only about 1100 Å 2 of solvent accessible surface area (ϳ11% of the total solvent accessible surface of TruD). The interaction surface has several gaps and is only about half hydrophobic, which is characteristic of transient non-obligate protein-protein interactions (22). The interface between the catalytic and TRUD domains also has several buried well ordered water molecules that can be seen in the electron density. These are indications of a weak interface between the two TruD domains. The fact that the cleft between the catalytic domain and the TRUD domain is too narrow to accommodate the width of a tRNA leads us to propose that the angle between the two domains can vary and that the V shaped TruD undergoes a hinge-like motion. This hypothesis is also supported by the fact that there are several highly conserved glycines in the extended loops that connect these two domains. Movement upon binding RNA has been demonstrated for TruB. Comparison of apo and RNA-bound structures of TruB showed that a 29-residue loop reorganizes upon binding a stem-loop RNA (9,23). Our structure cannot distinguish between the opening of the cleft as a result of tRNA binding to TruD and triggering this change or the binding of tRNA serving to trap an open form preexisting in equilibrium with the closed form seen in the crystal.
TruD may also gain access to U13 by reorganizing the tRNA Glu structure upon binding. Recent co-crystal structures of tRNA guanine transglycosylases (TGTs) with RNA have with a novel fold which we term the TRUD domain. ⌿ synthase structural motifs I (purple), II (yellow), IIa (peach), III (dark blue), and IIIa (green) are highlighted. The aspartate residue (Asp 80 ) essential for catalytic activity is shown in ball-and-stick representation. B, same as in A except that the view has been altered by a 140°turn on the y axis and 20°downward tilt on the x axis. Sites of insertions (also see Fig. 3) found in TruD homologs are marked by arrowheads. C, molecular surface of TruD colored by electrostatic potential (Ϫ8.9 kcal/mol in red to 8.9 kcal/mol in blue) using GRASP (29). Same view as in B. All panels were prepared with PyMOL (30).

FIG. 1. TruD modifies uridine 13 of tRNA Glu to pseudouridine.
A, the conversion of uridine to pseudouridine. The dotted line represents the axis about which the uracil ring is proposed to rotate after cleavage of the N-C glycosyl bond, and before the formation of the C-C link. B, the substrate of TruD in E. coli is U13 of tRNA Glu , shown here after modification to ⌿ (marked by arrow). S, mnm 5 s 2 U; T, m 5 U.

Crystal Structure of TruD, a Novel Pseudouridine Synthase
shown that substantial reorganization of RNA structure can occur without concomitant changes in protein structure. Prokaryotic TGT rearranges the end of a stem-loop RNA to fit its active site, with little change of the enzyme (24). Archaeosine TGT (an archaeal enzyme) completely unfolds the D arm of tRNA Val converting the classic L-form tRNA to a "-form" to gain access to G15, but the enzyme structure changes little from the tRNA-free form (25). Docking of the -form tRNA into the cleft of TruD was attempted but too many clashes were found (data not shown). Attempts at docking tRNA Glu from T. thermophilus (26) produced fewer clashes; however, significant unfolding of the D-arm would be required to bring U13 into the catalytic pocket of TruD. Determining the actual mechanism TruD uses to gain access to U13 by co-crystallization with tRNA Glu is under way.
How does TruD choose tRNA Glu from all the other tRNAs in the cell? In yeast, the TruD homolog, Pus7p, has been suggested to recognize the sequence motif Pu 9 (G/C)UN⌿APu 15 and that U11 is particularly important (7). In E. coli, only tRNA Glu has ⌿13 and only tRNA Gly (NCC) has U13. The corresponding sequences are, respectively, 9 CGUC⌿AG 15 and 9 CGUAUAA 15 , both of which have U11 and both differ from the yeast sequence at position 9. Discounting the variation at position 9, since tRNA Glu is a substrate, the failure of tRNA Gly (NCC) to be a substrate must be due to some more subtle feature distinguishing the two tRNAs.
A DALI (27) search revealed that the closest structural neighbors of TruD in the Protein Data Bank are other ⌿ synthases. TruA (Z score ϭ 7.8, r.m.s.d. 3.1 Å over 156 C␣ atoms) and TruB (Z score ϭ 7.3, r.m.s.d. 3.2 Å over 144 C␣ atoms) are the most similar followed by RsuA (Z score ϭ 5.9, r.m.s.d. 2.9 Å over 119 C␣ atoms) and RluD (Z score ϭ 3.8, r.m.s.d. 3.1 Å over 105 C␣ atoms). An alignment is shown in Fig. 3. The resulting superposition of the backbones of TruA and TruB, two ⌿ synthases that also bind tRNAs, with TruD shows that the structural overlap is extensive except for the PUA and the TRUD domains (Fig. 4A). The overlap in the catalytic fold of TruA, TruB, RsuA, and RluD was characterized previously (11,12) and can be simply summarized as the sharing of ten elements, six ␤ strands, two ␣ helices, and two loops, in a linear order along the amino acid sequence: ␤1, ␤2, ␣1, L4, ␤3, ␤5, ␤10, ␤11, L21, ␣8 (Fig. 2A). The catalytic aspartate always appears in the loop L4. This arrangement is also shared in the TruD family except that the first ␤ strand in TruD corresponds to the last ␤ strand in the other four families (Fig. 3). It is quite interesting that a way has been found to rearrange the linear order of these elements in TruD while keeping the same fold.
Five conserved motifs in the TruA, TruB, RsuA, and RluA families of ⌿ synthases have been characterized from comparison of the TruA, TruB, RsuA, and RluD structures (12). The corresponding structural alignment of TruD shows that it also contains these five motifs (Fig. 3). The major difference is that motif IIIa appears first in TruD. Residues from four motifs come together in the largely hydrophobic active sites of the TruA, TruB, RsuA, and RluD ⌿ synthases (12). The essential catalytic Asp (motif II), a basic residue (usually Arg or Lys) that forms a salt bridge to the Asp (motif III), a tyrosine that provides stacking interactions with the uracil of the substrate uridine (motif IIa) (9,23), and two hydrophobic residues (an Ile or Val from motif III and a Leu from motif IIIa). In TruD, the Asp 80 from motif II has been conserved, but its salt bridge partner is no longer a conserved basic residue from motif III because Thr 331 occupies this location (Fig. 4B). Instead, Lys 21 from loop 1 is the salt bridge partner of Asp 80 . Phe 131 replaces Tyr from motif IIa, but a Phe could serve the same stacking function as the Tyr. Leu from motif IIIa has been replaced by the bulkier hydrophobic side chain Phe 27 , and a smaller hydrophobic residue, Ala 330 , has replaced the hydrophobic Ile or Val from motif III.
The TRUD domain folds into a novel three-dimensional protein fold that is not found in any other protein. A DALI (27) search with either the entire TruD structure or only the TRUD domain alone did not find any structural homologs of the TRUD domain in the Protein Data Bank. Moreover, a search of the non-redundant sequence databases with PSI-BLAST (28) found that the protein sequence of the TRUD domain is always associated with a TruD-type catalytic domain and that it is not found on its own or attached to another type of protein as a separate module. Furthermore, there are no TruD-type catalytic domains that lack the TRUD domain insert. Although we found no known putative sequence in any other protein significantly similar to the TRUD domain, it is always possible that one will be found in a new genome or that this fold will be present in a different (as of yet unsolved) protein structure without sequence conservation.
The TRUD domain is characterized by two conserved sequence motifs (motifs IV and V in Ref. 6). The first of these motifs (IV; residues 155-169 in Fig. 3) forms a long extended loop starting at the bottom of the inter-domain interface and extends up to form part of the cleft that likely binds tRNA. This motif has several conserved glycines as well as two aromatic side chains (Tyr 159 , Phe 160 ) that form a part of the hydrophobic core of the TRUD domain. Arg 164 and Phe 165 in this motif are solvent exposed side chains that extend out of the TruD cleft and would likely be involved in tRNA binding. Gln 163 is also solvent exposed but forms a weak salt bridge with the putative catalytic Asp 80 in our structure.
The second sequence motif (V, residues 196 -211 in Fig. 3) lies along a long helix ␣4 that extends all along the length of the TRUD domain. Many of the conserved residues in this motif contribute to the hydrophobic core of the TRUD domain. Arg 210 in this motif forms a salt bridge to the highly conserved Asp 223 . The TRUD domain sequence in the TruD family is also characterized by large insertions at several specific sites that are seen in many archaeal and eukaryotic homologs (marked by solid triangles in Fig. 3). These inserts all map to solvent exposed loops that face away from the TruD cleft and putative tRNA binding surface (Fig. 2B).
In summary, TruD folds into a V-shaped molecule with a catalytic domain that is structurally very similar to the catalytic modules of the other known ⌿ synthases despite its lack of sequence homology and likely arose by divergent evolution. In addition, TruD has acquired a large separate domain which is likely to be involved in substrate recognition. This domain has a unique fold that is not present in the Protein Data Bank and may represent a new RNA binding module.