Molecular Architecture and Ligand Recognition Determinants for T4 RNA Ligase*

RNA ligase type 1 from bacteriophage T4 (Rnl1) is involved in countering a host defense mechanism by repairing 5′-PO4 and 3′-OH groups in tRNALys. Rnl1 is widely used as a reagent in molecular biology. Although many structures for DNA ligases are available, only fragments of RNA ligases such as Rnl2 are known. We report the first crystal structure of a complete RNA ligase, Rnl1, in complex with adenosine 5′-(α,β-methylenetriphosphate) (AMPcPP). The N-terminal domain is related to the equivalent region of DNA ligases and Rnl2 and binds AMPcPP but with further interactions from the additional N-terminal 70 amino acids in Rnl1 (via Tyr37 and Arg54) and the C-terminal domain (Gly269 and Asp272). The active site contains two metal ions, consistent with the two-magnesium ion catalytic mechanism. The C-terminal domain represents a new all α-helical fold and has a charge distribution and architecture for helix-nucleic acid groove interaction compatible with tRNA binding.

Bacteriophage T4 RNA ligase 1 (EC 6.5.1.3), the founding member of the RNA ligase family (1), is a very well studied representative of the nucleotidyltransferase superfamily, which includes RNA ligases, DNA ligases, and RNA capping enzymes. All members of this family hydrolyze a pyrophosphate bond of a ribonucleotide triphosphate and make a high energy phosphoramidate linking the nucleotide monophosphate with an essential lysine in the active site. This lysine is identified within a conserved motif KX(D/N)G motif (motif I) and is responsible for the formation of the covalent bond to ATP, NAD, or GTP (2). DNA ligases and RNA capping enzymes share five amino acid sequence motifs: I, III, IIIa, V, and IV (3). Motifs III and IIIa are missing in the Rnl1 sequence (4) but are present in the recently discovered T4 RNA ligase 2 (Rnl2) (5,6). Rnl2 shares greater sequence homology with DNA ligases and RNA capping enzymes compared with Rnl1. It may suggest that Rnl1 has a more specific role than Rnl2, although the function of Rnl2 still remains unknown.
The biological role of Rnl1 involves the countering of a host defense mechanism invoked following phage infection of the bacterial host. The bacterial tRNA Lys -specific anticodon nuclease (ACNase) is kept latent because of the association of its core protein, PrrC, with the endonucle-ase EcoprrI, which stabilizes PrrC and masks its activity (7). Upon infection, T4 bacteriophage expresses a T4 Stp peptide (8), which inhibits EcoprrI and activates the latent enzyme. Anticodon nuclease is involved in the 5Ј cleavage of the wobble base of tRNA Lys (9). This modification of tRNA Lys acts as a defense mechanism by inhibiting phage protein synthesis and, as a consequence, stops the infection. Bacteriophage T4 has developed a counter-defense mechanism using Rnl1 and polynucleotide kinase (PnK) to repair the break in the tRNA anticodon loop. Thus Rnl1 plays an important in vivo role in the spread of the bacteriophage. It has been shown that if PnK and Rnl1 are not present, viral protein synthesis is blocked by depletion of tRNA Lys (10). A second biological role for Rnl1 has been reported, the promotion of tail fiber attachment (TFA) 2 to the phage baseplate (11). In the absence of Rnl1 the TFA reaction proceeds at a slow rate. Rnl1 can enhance the rate by up to 50-fold (12) but does not affect the final yield of attached tail fibers. Ligase and TFA activities may be mechanistically unrelated because the reaction requirement and the response to some inhibitors are different (11).
Because RNA is more sensitive to degradation than DNA, it has been proposed that polynucleotidyltransferase ancestors catalyzed RNA repair/recombination and then may have evolved into other nucleotidyl ligases and capping enzymes by acquisition of different C-terminal domains. In the Rnl2 structure, there are no amino acid side-chain contacts with the adenine ring, suggesting that changes occur in the active site to achieve GTP specificity in the case of RNA capping enzymes (13).
Rnl1 catalyzes the formation of phosphodiester bonds between the 5Ј-phosphate and the 3Ј-hydroxyl termini of single-stranded nucleic acids (14,15). Rnl1 catalysis involves three steps and requires ATP and divalent metals. In the first step, the ␣-phosphate moiety of ATP reacts with Lys 99 to form a covalent ligase-(lysyl-N)-AMP intermediate plus pyrophosphate. Formation of the (⑀-amino)-linked adenosine monophosphoramidate is reversible. Secondly, AMP is transferred from the covalent intermediate to a 5Ј-phosphate RNA end. Finally, the RNA termini are sealed by the attack of the 3Ј-OH RNA via a phosphodiester bond and liberation of AMP, a process analogous to mRNA splicing (16,17), although in the latter case two breaks in the RNA are required to excise the intron. The Rnl1 family members are narrowly distributed (4) and include a putative RNA ligase/polynucleotide kinase encoded by the baculovirus Autographa californica nucleopolyhedrovirus (ACNV), tRNA ligases of fungi, and RNA ligase from the bacteriophages RM378 (18) and TS2126 (19).
We report the crystal structure of the Rnl1-ATP analogue complex at 2.2 Å resolution, the first complete 3Ј-5Ј RNA ligase structure solved; previously only structures of N-terminal fragments of Rnl2 (13) and an editing ligase, TbREL1 (20), have been available. The structure high-* This work was funded by Federal funds from NCI, National Institutes of Health, under Contract N01-CO-12400 (Article H.36 of the Prime Contract). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. lights the interactions in the active site between AMPcPP, divalent metals, and the two different domains of the protein. Such data will be of value in the further characterization of the catalytic mechanism and in better understanding the role of the divalent metal cofactor in the enzyme reaction. Moreover, the C-terminal domain shows a new protein fold that is likely to play a role in tRNA binding.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification of T4 RNA Ligase-The expression plasmid (pNHT4RNAligase) coding for an N-terminal histidinetagged Rnl1 (MRGSH 6 GS-Rnl1) and the repressor plasmid (pDM1.1) were both transformed into Rosetta(DE3)pLysS for protein expression. Cultures were grown at 37°C in Luria broth medium supplemented with 50 g/ml carbenicillin, 34 g/ml chloramphenicol, and 50 g/ml kanamycin. When the A 600 reached 0.7, the cells were induced with 1 mM isopropyl-1-thio-␤-D-galactopyranoside and incubated for an additional 4 h. Cells were harvested by centrifugation, resuspended in buffer A (50 mM NaH 2 PO 4 /Na 2 HPO 4 , pH 7.4, 300 mM NaCl) plus 1 mM phenylmethylsulfonyl fluoride, and disrupted by sonication. The supernatant was clarified by high speed centrifugation and applied to a 5-ml nickel chelating column (Hitrap chelating column, Amersham Biosciences) pre-equilibrated with buffer A. T4 RNA ligase was eluted with a 0 -500 mM imidazole gradient in buffer A. Ligase-containing fractions were diluted 100-fold in 20 mM Tris/HCl, pH 7.0, 1 mM EDTA (buffer B) applied to a 1-ml anion exchange column (DEAE-Sepharose, Amersham Biosciences) pre-equilibrated with buffer B, and eluted with a gradient of 0 -700 mM NaCl in buffer B plus NaCl. Finally, the protein was concentrated and loaded onto a Superdex S200 gel filtration column (Amersham Biosciences) equilibrated with 50 mM Tris/HCl, pH 7.4, 200 mM NaCl. The protein appeared to be monomeric on the elution profile. The peak fractions were concentrated to 14 mg/ml for crystallization. Selenomethionine-substituted protein was expressed using the methionine auxotroph strain B834 and purified using the same protocol.
Crystallization and Data Collection-Rnl1 was screened in a total of 2688 droplets set up in 96-well Greiner plates using a Cartesian Technologies pipetting robot (672 standard conditions of Hampton, Wizard, and Emerald kits for the apoform and four different ATP analogues with MgCl 2 ). Crystals of the complex with AMPcPP were observed growing from polyethylene glycol 3350 and 0.2 M CaCl 2 . Selenomethionine-labeled protein did not crystallize unless 5 mM dithiothreitol was added to the reservoir solution.
X-ray diffraction data for selenomethionine-labeled T4 RNA ligase-AMPcPP crystals were collected to 2.2Å in-house. Three-wavelength MAD data (peak, remote, and inflection) for selenomethionine-substituted ligase crystals were collected at the European Synchrotron Radiation Facility Beamline BM14 to 2.1 Å. Indexing and integration of data images were carried out with HKL2000 (MAD data set) or DENZO (in-house data set), and data were merged using SCALEPACK (21). Two related crystal forms, belonging to either the space group P2 1 or C2, were identified. As the P2 1 form gave better data quality, we used this for all further studies. The cell dimensions of the P2 1 form are a ϭ 105.1 Å, b ϭ 39.9Å, c ϭ 108.5Å, and ␤ ϭ 117.3°, with two molecules per asymmetric unit. The statistics for x-ray data collection are given in Table 1.
Structure Solution and Refinement-Following data preparation with SHELXC, SHELXD (22) identified selenium sites, which were fed into SOLVE (23); a total of 20 sites were then located. Automated model building was performed with RESOLVE and Arp/Warp, resulting in 79% of the residues being placed (38% of amino acids assigned). The remainder of the model was built manually using the program O (24). The structure was refined in CNS (crystallography NMR software) (25) using simulated annealing and B-factor refinement. Initially, the MAD selenomethionine T4 RNA ligase structure was rebuilt after the initial RESOLVE model and subjected to a few rounds of refinement, but subsequent work used the in-house data set, which, although of slightly lower resolution (2.2Å compared with 2.0Å), was overall of better quality. Phasing and refinement statistics are shown in Table 1. tRNA Glu from Escherichia coli (Protein Data Bank ID code1G59) was docked on to the surface of Rnl1 using O.

RESULTS AND DISCUSSION
Overall Structure of Rnl1-The refined crystal structure at 2.2 Å resolution (see Table 1 for data collection, phasing, and refinement statistics) showed that the 374 amino acid Rnl1 was organized into a two domain structure. The N-terminal region includes an ␣ helix (␣1) followed by an antiparallel ␤-sheet (␤1-␤4) (Fig. 1A). Although this sheet

T4 RNA Ligase Structure
other members of the Rnl2-like protein family. Although the ␤-strands containing the nucleotidyltransferase motifs are very well superimposed, the ␣-helices are positioned more divergently ( Fig. 2A). Although Rnl1 shares the same nucleotidyltransferase domain architecture with capping enzymes and DNA ligases, the C-terminal domain structure is very different among members of this family. Capping enzymes or DNA ligase have an OB-fold domain that flanks the nucleotidyltransferase domain, this OB-fold is absent from Rnl1 and Rnl2. The secondary structure of the C-terminal Rnl1 domain consists entirely of ␣-helices (Fig. 1, A and C). There is a very long ␣-helix (␣-9) of about 30 residues that interacts with a series of four ␣-helices that lie almost parallel to one another but at an angle of ϳ45°to ␣-9. DALI (26) searches did not find any structural fold in the Protein Data Bank similar to this C-terminal domain. BLASTP revealed that some proteins of unknown three-dimensional structure, such as the RnlA RNA ligase from bacteriophage RB69 or from bacteriophage 44RR2.8t, contain domains that share a high sequence identity (61 and 40%, respectively) with this C-terminal region of Rnl1 and thus will have the same protein fold. The TFA activity of Rnl1 may be related to the structural differences between it and other ligases. Although the second role of Rnl1 may have affected the evolution of the protein structure, it is difficult to draw any particular inferences about which structural features promote TFA activity. ATP Binding Site-Analysis of the nucleotide binding site shows that both the N-and C-terminal domains are involved in forming interactions with ATP. The N-terminal domain makes hydrogen bond interactions via the side-chain hydroxyl group of Tyr 37 to one phosphoryl oxygen from the ␥-phosphate (Fig. 2B). The guanidinium group of Arg 54 is linked by three hydrogen bonds to the AMPcPP, by two to the ␤-␥ oxygen, and by one to the ribose 3Ј-OH. Arg 54 and the Gly 55 are conserved in all Rnl1-like proteins. Site-directed mutagenesis has indicated that Arg 54 is essential for RNA adenylation (step 2) but not for enzyme adenylation or formation of the phosphodiester bond (4).
The ␤-strands of the core region contain most of the residues involved in binding the ATP analogue. Lys 75 , Lys 99 (motif I), Lys 119 (motif Ia), Lys 240 (motif V), and Lys 242 (motif V) interact with the phosphate/phosphonate groups. Whereas Lys 99 was found to be the site of adenylation in Rnl1 by fast atom bombardment-mass spectrometric analysis (27), in our structure the ␣-phosphonate is more than 3 Å from the Lys 99 N (Fig. 2, B and C). This relative positioning of ␣-phosphonate and Lys 99 is presumably because of the presence of the methylene link in the ATP analogue but may also suggest that some conformational changes have to occur to allow formation of a covalent bond. A similar situation is observed in T7 DNA ligase, the Enterococcus faecalis DNA ligase, and in the Rnl2 structure, where the counterpart of lysine, catalytically equivalent to Lys 99 , is situated at a distance incompatible with covalent interaction with the phosphate. Moreover, Odell et al. (28) as well as Shuman and Schwer (3) have suggested from the Chlorella virus DNA ligase-adenylate intermediate structure that the nucleotide base conformation change, from syn to anti, accompanies the covalent attachment of the lysine. Indeed, it seems that a noncovalently bound ligand (ATP or NAD) in an RNA or DNA ligase has the nucleotide base in a syn conformation, whereas the conformation is anti if the covalent bond is made. This trend is supported by the Rnl1 structure where the AMPcPP adenine ring is in a syn conformation. It would, however, be difficult to infer whether the conformational change has to occur to permit the covalent bond to form or whether it is a consequence of covalent bond formation. Glu 159 is conserved in the Rnl1 family members but is not found in other nucleotidyltransferase sequence motifs. Glu 159 interacts with the ribose 2Ј-OH exactly as does Rnl2 Glu 99 (Fig. 2C).
No side-chain contacts are made with the adenine ring, but rather main-chain interactions are present via two hydrogen bonds, i.e. one with the main-chain carbonyl of residue Thr 98 and one with a water molecule stabilized by Leu 179 and Asn 180 (main-chain CO and sidechain CO, respectively) (Fig. 2B). Because the adenine ring only makes contacts with the protein main chain in Rnl2 as in Rnl1, it has been proposed that the nucleotidyltransferase ancestor was ATP-dependent and could have evolved into GTP-or NAD-dependent enzymes by acquiring side chains in the active site responsible for ligand specificity (13).
It has also been suggested, by comparison with the Chlorella virus motif IV Glu 161 , that Glu 227 from motif IV coordinates a divalent cofactor ion (3,4). In our structure, Glu 227 interacts indirectly with Ca 2ϩ via two water molecules. These are the only interactions between the protein and the metal ion, which is coordinated to six water molecules and the AMPcPP ␣-phosphonate. Interestingly, as the Glu 227 carboxyl group is situated 2.6 Å from Lys 99 N, this may indicate that the residue plays a role in the correct orientation of the lysine responsible for forming the covalent adenyl intermediate. In the Rnl2 structure, the same interaction is observed between Glu 204 and Lys 35 (Fig. 2C).
There are no hydrogen bond contacts between AMPcPP and the C-terminal region, but there is an interaction between the carbonyl of Gly 269 and the Mg 2ϩ ion. The latter ion is also coordinated to the sidechain carboxyl of the conserved Asp 272 . Asp 272 allows the correct placement of metal and thus the correct orientation of the phosphate.
Divalent Cation Sites-Using transient optical absorbance and fluorescence spectroscopy on T4 Rnl1 and DNA ligases, Cherepanov and de Vries (29) have proposed that nucleotidyltransferase catalysis proceeds via a two-metal ion mechanism. It has been demonstrated that metal ions such as Mg 2ϩ are essential for the joining of nucleic acids by T4 DNA and T4 RNA ligases. These enzymes react via the same mechanism, although DNA ligase shows a higher rate of ATP binding. One reason for this difference is the decreased K d for Mg 2ϩ . Studies on nick sealing by Rnl1 (and also by T4 DNA ligase) have shown that it cannot bind ATP-Mg 2 directly, but rather that ATP-Mg binds first and subsequently a second Mg 2ϩ ion. ATP-Mg 2 is thus the true substrate in the adenylation reaction (29). The reverse reaction (ATP formation) is also possible using Mg-P 2 O 7 as a substrate.
The Rnl1 structure shows Ca 2ϩ (temperature factor ϭ 22.1 Å 2 ) coordinated to six water molecules and also interacting with one phosphoryl oxygen of the AMPcPP ␣-phosphonate (metal ion site A). A Mg 2ϩ (temperature factor ϭ 11.5 Å 2 ) interacting with one phosphoryl oxygen of the ␤-phosphate is also present (metal ion site B) (Figs. 1B and 2B). Rnl1 only crystallizes in the presence of Ca 2ϩ , which can be explained by the presence of two symmetry-related Ca 2ϩ ions at the crystallographic dimer interface. The positive charge of the Ca 2ϩ ions is presumed to allow the interaction between the negatively charged surfaces of two Rnl1 molecules (Fig. 3A). Each of these Ca 2ϩ ions situated at the interface is coordinated to residue side chains belonging to each crystallographic monomer.
The presence of a Ca 2ϩ ion is quite surprising in site A. It could be explained by the relatively much higher concentration of CaCl 2 compared with MgCl 2 (200-fold molar excess). However, a Mg 2ϩ ion is also bound under these conditions at site B, so we can infer that the latter Mg 2ϩ site is more specific and allows only certain types of coordination. Indeed, as mentioned previously, the site A Ca 2ϩ does not directly interact with the ligase but via water molecules interacting with Glu 227 , Glu 159 , Lys 99 , Glu 100 , and Tyr 246 (Fig. 2B). In the Rnl2 structure, Glu 204 superimposes with Rnl1 Glu 227 . However, although the Rnl1 Ca 2ϩ ion is replaced by water, three of the Ca 2ϩ -interacting waters are present in

T4 RNA Ligase Structure
Rnl2. From this we conclude that the divalent metal site A is in fact the same in both Rnl1 and -2 and that Glu 204 of Rnl2 also interacts via water molecules with Mg 2ϩ . Metal site B Mg 2ϩ forms a closer interaction with the AMPcPP ␤-phosphoryl oxygen l (2.2 Å) than to the ␥-phosphate oxygen which is 3.5 Å away. It also forms an interaction with three water molecules and two residues (Gly 269 and Asp 272 ), which belong to the Rnl1 C-terminal domain. Ho et al. (13) have inferred that the Rnl2 C-terminal domain is specifically required for the second step of the ligation (AMP transfer to the 5Ј-PO 4 RNA) and that the domain is an obstacle to adenosine diphosphate RNA binding or sealing. Because both enzymes share a similar catalytic mechanism and because the Rnl1 C-terminal domain is important in stabilizing metal site B, we suggest that site B Mg 2ϩ also plays a role in the ligation second step, most likely by interacting with the negatively charged RNA backbone.
RNA Binding Site-Despite significant efforts screening with a range of ligands including ATP, AMPcPP, or AMP, we were unable to crystallize an RNA-Rnl1 complex. Rnl1 seems to crystallize only with AMPcPP, which, on the basis of the proposed ping-pong mechanism, is incompatible with the presence of RNA in the active site.
Analysis of charge distribution on the protein surface gives an indication of a possible site of RNA binding. The majority of the protein surface is negatively charged, apart from the C-terminal domain, which presents positive charges that could interact with the RNA backbone phosphates (Fig. 3A). Furthermore, we know that the RNA should be in close proximity to the ATP binding site to enable the AMP transfer from Lys 99 . Docking studies (Fig. 3B) suggest that RNA binds at the surface of the C-terminal domain, thereby positioning the anticodon loop toward the ATP binding site in the core region. Interestingly, the C-terminal architecture of the parallel ␣-helices (␣-7 and ␣-10 -12) positioned along the ␣-9 helix matches the tRNA groove architecture. As mentioned earlier, the Rnl1 C-terminal domain is unique among nucleotidyltransferase family members, suggesting that its topology evolved to bind specifically tRNA Lys in vivo.
In the active site, a chloride ion (temperature factor ϭ 17.8 Å 2 ) is positioned 3.1 Å from the ribose 3Ј-OH and at the same distance from a Ca 2ϩ -coordinated water molecule (Fig. 2B). The negative charge of the chloride may mimic the 5Ј-phosphate of the incoming RNA as proposed for the sulfate ion in the Chlorella virus DNA ligase structure (28,30). The chloride ion is situated above the AMP moiety on the active site face, which is accessible to RNA, and is sterically compatible with the position of a nucleic acid phosphate. Wang et al. (4) have reported the specific function of Arg 54 and Lys 119 in the RNA adenylation reaction. In our structure, Arg 54 is 3.5 Å from the chloride ion and would be even closer to a phosphoryl oxygen if a phosphate occupied the same site. Thus, we suggest that Arg 54 stabilizes and orientates the RNA phosphate by hydrogen bonding in the second step of the reaction. However, as mentioned above, Arg 54 is also involved in the first step of the reaction, stabilizing the ribose 3Ј-OH and the ␥-phosphate, although it has been shown that its mutation to alanine partial activity is retained in the first steps, whereas activity was abolished in the second step of the reaction (4). This raises the question of why such a seemingly important residue is not conserved in other nucleotidyltransferases. By superimposing the structures of Rnl1 and Rnl2, we observed that the position of the guanidinium group of Arg 54 is quite close to corresponding group of the Rnl2, Arg 55 (Fig. 2C). Arg 55 is conserved in Rnl2-like proteins and is essential for catalysis. Interestingly, the chloride ion is replaced in the Rnl2 structure by a water molecule 2.6 Å from Arg 55 . Thus we postulate that Arg 54 , which is conserved in Rnl1-like proteins and belongs to a domain not present in Rnl2, plays the same role as the conserved Arg 55 , i.e. that of orientating the RNA phosphate in the active site.
Lys 119 has been shown to be as important for the second catalysis step as Arg 54 , because mutation of either residue interferes with adenylation of RNA (4). Although the Rnl1 structure does not directly indicate its role, we nevertheless suggest that Lys 119 , based on its placement at the entrance of the active site and the position of the chloride ion, may interact with RNA. Lys 119 may interact with a 3Ј-hydroxyl ribose group, for example, and thus could be involved in the 5Ј-P RNA recognition.
Although we have to keep in mind that AMPcPP might not bind as exactly as ATP because of its methylene group, and that Lys 99 has to be deprotonated to make a covalent bond, nevertheless the Rnl1 structure still provides information about nucleotide binding prior to the adenylation reaction. The structure also provides clues about the ligation second step. We suggest, as in the case of the Chlorella virus DNA ligase, that the chloride ion mimics the 5Ј-P-RNA. We also suggest that Arg 54 , essential for the second step reaction, catalytically corresponds to the conserved Arg 55 of Rnl2. Because all five nucleotidyltransferase motifs are present in Rnl2, it has been proposed that Rnl2 is homologous to the nucleotidyltransferase family ancestor (5); thus we infer that acquisition of the additional N-terminal residues, and especially Arg 54 , was an adaptation for Rnl1 to replace the missing Rnl2 Arg 55 . Moreover, the structure confirms the presence of two divalent metal ions in the active site and shows the essential interacting residues. Both metals interact with the ATP analogue and water molecules, but the site B metal also interacts with two residues of the Rnl1 C-terminal domain. The latter domain is a completely new protein fold, which presents a structure and a surface charge profile compatible with tRNA binding. The work reported here has thus provided structural information for a complete RNA 3Ј-5Ј RNA ligase.