Crystal Structure and Nonhomologous End-joining Function of the Ligase Component of Mycobacterium DNA Ligase D*

DNA ligase D (LigD) is a large polyfunctional enzyme involved in nonhomologous end-joining (NHEJ) in mycobacteria. LigD consists of a C-terminal ATP-dependent ligase domain fused to upstream polymerase and phosphoesterase modules. Here we report the 2.4 Å crystal structure of the ligase domain of Mycobacterium LigD, captured as the covalent ligase-AMP intermediate with a divalent metal in the active site. A chloride anion on the protein surface coordinated by the ribose 3′-OH and caged by arginine and lysine side chains is a putative mimetic of the 5′-phosphate at a DNA nick. Structure-guided mutational analysis revealed distinct requirements for the adenylylation and end-sealing reactions catalyzed by LigD. We found that a mutation of Mycobacterium LigD that ablates only ligase activity results in decreased fidelity of NHEJ in vivo and a strong bias of mutagenic events toward deletions instead of insertions at the sealed DNA ends. This phenotype contrasts with the increased fidelity of double-strand break repair in ΔligD cells or in a strain in which only the polymerase function of LigD is defective. We surmise that the signature error-prone quality of bacterial NHEJ in vivo arises from a dynamic balance between the end-remodeling and end-sealing steps.

DNA ligases are essential for the maintenance of genome integrity. As nick-sealing enzymes, they are responsible for joining the Okazaki fragments on the lagging strand of the DNA replication fork and for restoring the continuity of the DNA backbone subsequent to nucleotide excision and base excision repair. Nick sealing entails a series of three nucleotidyl transfer reactions (1). In the first step, a lysine nucleophile in the ligase active site attacks the ␣-phosphorus of ATP or NAD ϩ to form a covalent lysyl-N-AMP intermediate and expel either pyrophosphate or nicotinamide mononucleotide. In the second step, the 5Ј-phosphate of the DNA nick attacks the phosphorus of lysyl-AMP to form a DNAadenylylate (A(5Ј)pp(5Ј)DNA) intermediate and expel the lysine. The third step is the attack of the nick 3Ј-OH on the DNA-adenylylate to form a 3Ј-5Ј-phosphodiester and release AMP.
Many organisms have more than one DNA ligase, suggesting that a division of labor has evolved whereby individual ligase isozymes have taken on specialized functions in DNA replication, repair, and recombination. This specialization is often correlated with the fusion of new structural modules to the "ancestral" ligase catalytic core, either upstream of the NT domain or downstream of the OB domain, or both (reviewed in Ref. 20). Eukaryal organisms typically have at least two DNA ligases, of which one is devoted to DNA replication/repair functions (e.g. mammalian DNA ligase I or its yeast homolog Cdc9), whereas the other ligase (named DNA ligase IV) is dedicated to the nonhomologous end-joining (NHEJ) pathway of double-strand break repair (21)(22)(23). The specialization of eukaryal ligases I and IV has progressed to the degree that neither protein can perform the in vivo functions normally executed by the other (24). This situation reflects a loss of pluripotency, insofar as a minimal eukaryotic viral DNA ligase containing only the core NT and OB modules is capable of performing an apparently full repertoire of replicative, repair, and NHEJ functions in yeast when it is the only ligase available (15,25).
Recent studies have highlighted an unexpected complexity of the DNA ligase menu in bacteria. Although all bacterial species have an essential NAD ϩ -dependent DNA ligase (LigA), quite a few have either a second NAD ϩ -dependent ligase (e.g. Escherichia coli, Salmonella typhimurium, Shigella flexneri, Yersinia pestis, and Pseudomonas putida) (26) or an additional ATP-dependent ligase (e.g. Haemophilus influenzae, Pseudomonas aeruginosa, and Neisseria gonorrhoeae) (27)(28)(29). At the extreme end of the spectrum are Mycobacterium tuberculosis and Mycobacterium smegmatis, which have multiple nonessential ATP-dependent DNA ligases (named LigB, LigC, and LigD) in addition to the NAD ϩ -dependent LigA enzyme (30,31). Interest in LigD is fueled by evidence that it functions together with a bacterial Ku homolog in an NHEJ pathway characterized by a high incidence of frameshift mutations at the sites of DSB repair (30 -34). The fact that LigD and Ku are jointly present in the proteomes of many diverse bacteria hints that NHEJ is broadly relevant to bacterial physiology, perhaps as a mechanism to repair chromosome breaks in nondividing cells.
LigD is distinguished from all other DNA ligases insofar as its enzymatic activity is not limited to sealing DNA strands. Rather, it is a polyfunctional enzyme consisting of an ATP-dependent ligase (LIG) domain fused to a polymerase (POL) domain and a phosphoesterase module (29 -31, 33-36). Biochemical characterization of the POL and phosphoesterase components suggests that they provide a means of remodeling the 3Ј ends of broken DNA strands prior to sealing by the LIG component. The LigD POL domain catalyzes either nontemplated single-nucleotide additions to a blunt-ended duplex DNA or fill-in synthesis at a 5Ј-tailed duplex DNA; these are the molecular signatures of mutagenic mycobacterial NHEJ in vivo at blunt-end and 5Ј-overhang DSBs, respectively (31). Although LigD is a relatively poor nick-sealing enzyme in vitro compared with LigA or LigB, its strand joining activity is stimulated by Ku, which interacts physically with LigD, likely via the LigD POL domain (30,31,33,34,37).
To gain a better understanding of LigD and its role in NHEJ, we have begun to determine the structures and structure-activity relationships of the three component domains and to interrogate genetically their contributions to the efficiency and fidelity of DSB repair in vivo. The crystal structure of the POL domain of Pseudomonas LigD revealed a minimized polymerase with a two-metal mechanism and a fold similar to that of archaeal DNA primase (38). The role of the LigD POL during NHEJ in vivo was examined by allelic replacement in M. smegmatis. A mutant ligD gene encoding a polymerase-defective, ligase-active fulllength MsmLigD protein was introduced at the ligD locus. Ablating the polymerase activity resulted in increased fidelity of blunt-end DSB repair in vivo by virtue of eliminating nucleotide insertions at the recombination junctions (38). Thus, LigD POL is a direct catalyst of ϩ1 frameshifting during NHEJ in vivo.
Here we focus on the LIG domain of Mycobacterium LigD. We report the 2.4 Å crystal structure of the ligase-AMP intermediate with a divalent cation cofactor and a chloride anion (a putative mimetic of the DNA 5Ј-phosphate) coordinated in the active site. Structure-guided mutational analysis illuminates distinct roles for individual side chains during the three steps of the ligation reaction. We report surprising effects on NHEJ outcomes in M. smegmatis when only the LIG function of LigD is ablated.

EXPERIMENTAL PROCEDURES
LigD Mutants-Plasmid pET-MtuLigD encodes the M. tuberculosis LigD polypeptide fused to an N-terminal His 10 tag (30). Alanine mutations were introduced into the ligD gene by the two-stage PCR overlap extension method (62) using pET-MtuLigD as the template. The PCR products were digested with NcoI and BamHI and inserted into pET16b (Novagen). The inserts of the mutant pET-MtuLigD-Ala plasmids were sequenced completely to exclude the acquisition of unwanted changes during amplification and cloning.
LigD Purification-Wild-type and Ala mutant pET-MtuLigD plasmids were transformed into E. coli BL21(DE3). One-liter cultures of E. coli BL21(DE3)/pET-MtuLigD were grown at 37°C in Luria-Bertani medium containing 0.1 mg/ml ampicillin until the A 600 reached ϳ0.6. The cultures were chilled on ice for 30 min, adjusted to 0.1 mM isopropyl D-thiogalactoside and 2% ethanol, and then incubated at 17°C for 16 h with continuous shaking. Cells were harvested by centrifugation, and the pellet was stored at Ϫ80°C. All subsequent procedures were performed at 4°C. Thawed bacteria were resuspended in 30 ml of buffer A (50 mM Tris-HCl, pH 7.5, 2 M NaCl, 10% sucrose). Lysozyme and Triton X-100 were added to final concentrations of 100 g/ml and 0.1%, respectively. The lysates were sonicated to reduce viscosity, and insoluble material was removed by centrifugation. The soluble extracts were applied to 1-ml columns of nickel-nitrilotriacetic acid-agarose (Qiagen, Chatsworth, CA) that had been equilibrated with buffer A. The columns were washed with 8 ml of the same buffer and then eluted stepwise with 4-ml aliquots of buffer B (50 mM Tris-HCl, pH 8.0, 0.5 M NaCl, 10% glycerol) containing 25, 50, and 200 mM imidazole. Polypeptide compositions of the column fractions were monitored by SDS-PAGE. LigD was recovered predominantly in the 200 mM imidazole fraction. The wild- type and mutant LigD preparations were stored at Ϫ80°C. Protein concentrations were determined by SDS-PAGE analysis of serial dilutions of the LigD preparations in parallel with serial dilutions of a BSA standard. The gels were stained with Coomassie Blue, and the staining intensities of the LigD and BSA polypeptides were quantified using a Digital Imaging and Analysis System from Alpha Innotech Corp. LigD concentrations were determined by interpolation to the BSA standard curve.
LIG Domain Purification-A gene segment encoding the MtuLigD LIG domain (amino acids 452-759) was cloned into a pET-28b plasmid so as to fuse the LIG open reading frame to a leader sequence encoding an N-terminal hexahistidine tag followed by a tobacco etch virus protease cleavage site. The resulting plasmid (pAEB1120) was transformed into E. coli BL21(DE3) codon plus (RIL). A culture derived from a single transformant was grown to an A 600 of 0.4 -0.6 at 37°C, at which point production of the LIG domain was induced by incubation for 3-5 h after addition of 0.5 mM isopropyl D-thiogalactoside. Cells were harvested by centrifugation, resuspended in lysis buffer (50 mM Tris-HCl, pH 7.5, 300 mM NaCl, 10% glycerol, 1 mM phenylmethylsulfonyl fluoride, and 1 M pepstatin A), and stored at Ϫ80°C. Production of selenomethioninyl LIG domain was performed according to Van Duyne et al. (39).
For purification, thawed cells were sonicated and centrifuged, and soluble His-tagged LIG was purified by nickel-affinity chromatography (His-Trap, Amersham Biosciences). LIG-containing fractions were pooled and subjected to cleavage of the His tag by hexahistidine-tagged tobacco etch virus protease, followed by a second nickel affinity step. The tag-free LIG domain recovered in the flow-through was concentrated by centrifugal filtration (Centriprep-10; Amicon) and further purified using an S-200 gel filtration column (Amersham Biosciences). The elution profile of the LIG protein was consistent with a monomeric quaternary structure (data not shown). Protein purity as assessed by SDS-PAGE indicated that the preparation was greater than 95% pure. Electrospray ionization mass spectroscopy indicated that the major purification product had a mass of 331 Da larger than expected from the polypeptide sequence. From this datum, we surmised that the major portion of the purified protein was in the adenylylated form.
Crystallization of the LIG Domain-Purified LIG domain was dialyzed against a buffer containing 10 mM Tris-HCl, pH 7.5, 200 mM NaCl, 5 mM magnesium acetate, 1 mM ATP, and 2 mM DTT for 3-5 h at room temperature. ATP and magnesium acetate were included in the dialysis conditions to further ensure that the protein was fully adenylylated. Protein concentration was then determined by A 280 in a 6 M guanidine-HCl solution (40), and crystallization trays were set at concentrations ranging from 10 to 15 mg/ml. Dialyzed LIG was mixed in a 2:1 (v/v) ratio with a precipitant solution containing 6% PEG 3000, 25 mM ZnCl 2 , and 100 mM sodium acetate, pH 4.6. After incubating on ice for 20 min, the mixture was centrifuged for 10 min at 4°C, and the clarified supernatant was equilibrated at 18°C by the hanging drop method against a well solution that contained 500 l of precipitant solution and 200 -500 l of dialysis buffer. Crystals typically took 3-7 days to reach maximal size. Crystal-containing trays were slow-cooled to 4°C for 24 h, after which crystals were transferred stepwise through precipitant solutions supplemented with 5, 10, 15, 20, and 25% xylitol. Crystals were allowed to equilibrate in 25% xylitol cryoprotectant for at least 1 h before harvesting and flash-freezing.
Data Collection and Structure Solution-X-ray diffraction data were collected at beamline 8.3.1 at the Advanced Light Source, Lawrence Berkeley National Laboratory (41). Data were processed using the HKL2000 suite (42). Collection and processing statistics are given in Table 1. SOLVE and RESOLVE (43,44) were used to find the selenium sites, calculate experimental phases, and generate and refine initial elec-tron density maps. RESOLVE was able to build 90% of the main chain atoms. Model building was carried out with O (64), and the model was refined with REFMAC5 (45). Coordinates have been deposited with the Protein Data Bank accession code 1VS0.
Construction of a LIG-defective LigD Strain of M. smegmatis-Plasmid pMSG346 (marked with selectable and counterselectable hygromycin and sacB genes) was designed to facilitate allelic exchange at the ⌬ligD locus of the M. smegmatis null mutant described previously (38). pMSG346 contains 503 bp of genomic DNA 5Ј of the ligD open reading frame and 490 bp of genomic DNA 3Ј of the open reading frame, with an NdeI site introduced at the start codon and a BamHI site introduced 37 bp 3Ј of the stop codon. The ligD-(K484A) gene encoding a LIG-defective, POL-active M. smegmatis LigD protein was inserted between the NdeI and BamHI sites of pMSG346. The resulting plasmid was transformed into the ⌬ligD strain. Allelic exchange was executed by the two-step selection/counterselection strategy (31). A control allelic exchange was performed using pMSG346 containing a wild-type M. smegmatis ligD insert. Restoration of the ligD locus was confirmed by Southern hybridization. The K484A mutation was verified by PCR-amplifying and sequencing the LIG coding region from the LIG-defective ligD strain.
Plasmid-based NHEJ-Assays were performed as described previously (31, 38) using a kanamycin resistance plasmid pMSG288 that was linearized at the EcoRV or Asp 7181 site within a lacZ gene. For calculation of fidelity values presented in Fig. 7B, the blue and white colony counts from three independent experiments (comprising nine independent transformations) were pooled. The total numbers (n) of colonies scored for the EcoRV-cut plasmid were as follows: wild-type ligD, n ϭ 9194; ligD-K484A, n ϭ 1850; ⌬ligD, n ϭ 820. The numbers of colonies scored for the Asp 7181-cut plasmid were as follows: wild-type ligD, n ϭ 7514; ligD-K484A, n ϭ 2239; ⌬ligD, n ϭ 1646. NHEJ efficiencies are normalized to that of the wild-type strain (defined as 100%); the data shown are the mean values of three independent transformations.

RESULTS
Crystal Structure of the LIG Domain of MtuLigD-The C-terminal ligase domain of selenomethioninyl-substituted MtuLigD (residues 452-759) was crystallized by vapor diffusion using a precipitant solution containing PEG-3000 and ZnCl 2 . Crystals belonged to the space group P3 2 21 (a ϭ b ϭ 57.1 Å, c ϭ 369.0 Å) and contained two LIG protomers per asymmetric unit. The structure was solved by MAD methods and refined to a resolution of 2.4 Å with an R work of 20.3% and an R free of 24.8%. The polypeptides displayed good geometry with no Ramachandran outliers (Table 1). Continuous electron density was apparent for all amino acids of both protomers, except for residues 652-659 in molecule A and 655-656 in molecule B. The structures of the two protomers were virtually identical (C␣ r.m.s.d. ϭ 0.31 Å).
As expected, the MtuLIG protein consists of an NT domain (amino acids 452-639) and an OB domain (amino acids 640 -759). The tertiary structure is depicted in Fig. 2A; the secondary structure elements are shown over the amino acid sequence in Fig. 1. Comparisons to the structural data base (via DALI (63)) show that the NT and OB folds of MtuLIG are most closely related to those found in human Lig1 (the NT C␣ r.m.s.d. is 2.3 Å for 171 residues and the OB C␣ r.m.s.d. is 2.3 Å for 108 residues, with Z scores of 18.7 and 13.3, respectively). The AMPbinding pocket is contained within the NT domain and is composed of a cage of ␤-strands and connecting loops that include the five defining motifs of the covalent nucleotidyltransferase superfamily (2). The Lys-481 nucleophile in motif I ( 479 EGKWDGYR) is located in a loop between the two antiparallel ␤-sheets that form the active site. The solvent-flattened experimental electron density map showed that an AMP molecule was linked covalently via a P-N bond to the Lys-481 side chain of each LIG protomer (Fig. 3A). The ␣-phosphate and the ribose 3Ј-OH of the nucleotide are exposed on the enzyme surface (Fig. 2, B and C), whereas the adenine base is buried in a hydrophobic sandwich between Phe-559 of motif IIIa ( 555 EFWAFDLLYLDG) and Trp-482 of motif I on one side of the base, and Ile-616 in motif IV ( 613 EGVIAK) on the other side. Motif V ( 633 WVKDKHWNTQE; colored blue in Fig. 2A) serves as the bridging segment between the NT and OB domains.
Domain Dynamics-Motif V consists of two ␤-strands, one in the NT domain and a second in the OB domain, with a short interstrand linker that acts as a flexible hinge for large domain movements that occur during the ligation reaction (Fig. 4). Studies of related nucleotidyltrans-ferase family members have shown that the enzyme-nucleotidylylation step is facilitated by closure of the OB domain over the nucleotidebinding pocket so that the essential RXDK peptide of motif VI, which is located near the C terminus of the OB fold, contacts the ␤and ␥-phosphates of the NTP substrate and positions the pyrophosphate leaving group apically to the attacking lysine (6,(17)(18)(19). The closed conformation is exhibited in the structure of Chlorella virus mRNA capping enzyme (6)  In the structure of human DNA ligase I bound to a nicked DNAadenylylate substrate (5), the OB domain (colored yellow in Fig. 4B) has undergone a twisting motion about the interdomain linker so that it presents a different surface of the OB domain for binding the duplex DNA than that used to engage the PP i leaving group during the ligase adenylylation reaction. Superposition of the MtuLIG structure on that of DNA-bound Lig1 suggests that the OB fold in MtuLIG has not yet attained the orientation of the DNA-bound protein (Fig. 4B), which leads us to infer that the final stage of the OB domain rearrangement is  directly triggered by binding of the nicked DNA substrate over the AMP on the NT domain. Architecture of the MtuLIG Active Site-The adenosine nucleoside of the MtuLIG-AMP intermediate is in the anti-conformation (Fig. 3A). This is in agreement with the anti-conformations of the nucleoside in the Chlorella virus ligase-AMP intermediate (4), the Thermus filiformis ligase-AMP intermediate (9), and the Candida albicans capping enzyme-GMP intermediate (7). In all of these cases, the anti-nucleoside conformation correlates with an open conformation of the OB domain relative to the NT domain. These findings contrast with the syn-nucleoside conformations seen in the crystals of T7 DNA ligase bound to ATP (3), the closed domain conformation of Enterococcus faecalis DNA ligase bound to NAD ϩ (10), and the closed domain conformation of Chlorella virus capping enzyme bound to GTP (6). Thus, the MtuLIG structure reinforces the suggestion that a change in nucleoside conformation from syn to anti after step 1 catalysis and domain opening is a conserved feature of the nucleotidyltransferase superfamily (2,4).
A rich network of protein-AMP contacts provides insight to substrate specificity and the nucleotidyltransferase mechanism. Putative specificity-conferring contacts between the LIG protein and the adenine base include the following: (i) hydrogen bonds from the exocyclic N6 amino group of adenine to both the backbone carbonyl of Gly-480 (motif I) and the side chain carboxylate of Glu-479 (motif I); (ii) a hydrogen bond from the backbone amide of Trp-482 (motif I) to the N7 atom of the adenine ring; and (iii) a hydrogen bond from the N atom of Lys-618 (motif IV) to the adenine N1 atom (Fig. 3B). The backbone contacts from motif I to adenine seen in the MtuLIG-AMP intermediate are identical to those observed in the AMP complex of bacteriophage T4 RNA ligase 2 (46). The interaction of Glu-479 with the adenine amine is equivalent to the contacts of a glutamate with adenine in the ATP complex of T7 DNA ligase and a threonine-adenine contact in the Chlorella virus ligase-AMP intermediate (3,4). As in the present case, the side chain of the T7 and Chlorella virus ligases that interacts with adenine 6-NH 2 is located two residues upstream of the lysine nucleophile. The direct contact of Lys-618 of motif IV to adenine N1 in the MtuLIG-AMP structure is reminiscent of the water-mediated hydrogen bond between adenine N1 and the motif IV Lys-209 side chain seen in the RNA ligase 2 structure (46).
There are several points of contact between MtuLIG and the ribose oxygens of AMP, including the following: (i) hydrogen bonds from the terminal guanidinium nitrogens of Arg-486 (motif I) and Arg-501 to the ribose O2Ј; and (ii) a water-mediated interaction between the carboxylate of Glu-530 (motif III) and the ribose O4Ј (Fig. 3B). The bridging water is part of a hydrogen-bonding network to the backbone carbonyl of Gly-484 and a second water that, in turn, interacts with the backbone amide of Arg-486 (motif I) and the terminal nitrogen of Arg-501 (Fig.  3B). Arg-501 is located atop a ␤-hairpin loop on the surface of the NT domain of MtuLIG. It resides within a conserved element named "motif Ia" (47), which is found at an equivalent position in the tertiary struc- . Active site architecture of MtuLIG. A, experimental solvent-flattened electron density, contoured at 1.5 above the mean, shows the covalent lysyl-adenylylate, a zinc cation (yellow sphere) coordinated by Asp-483 and Glu-613 (plus Asp-522 from symmetry-related LIG protomer), and a chloride anion (green sphere) coordinated by a ribose hydroxyl. B, network of direct and water-mediated contacts between AMP (yellow carbon atoms) and the enzyme (gold carbons) are shown, along with contacts to the zinc (yellow sphere) and chloride (green sphere). C, MtuLIG-AMP active site constituents (gold lettering and carbon atoms) and the bound chloride anion (green sphere) superimposed on the equivalent constituents (gray lettering and carbon atoms) and the bound sulfate anion (gray) of the Chlorella virus DNA ligase-adenylylate structure. Also shown are the positions of the reactive DNA strands (depicted as green tubes) at the nick of the DNAbound human Lig1 structure after superimposing its active site (not shown) on that of the other two ligases. tures of many other covalent nucleotidyltransferase family members. The arginine side chain is proposed to contact the ␥-phosphate of the NTP substrate and/or the reactive 5Ј-PO 4 end of the polynucleotide to be sealed (4 -7, 46, 48). In the MtuLIG-AMP structure, there is no ␥-phosphate with which Arg-501 can interact, so the contact with the ribose might represent a default state, as discussed previously for AMPbound T4 Rnl2, where the equivalent Arg-55 side chain in motif Ia contacts the ribose O3Ј (46).
There are relatively few direct protein contacts to the AMP phosphate, these being limited to the covalent linkage to Lys-481 and an interaction with Lys-635 of motif V (Fig. 3B). The second defining lysine of motif V (Lys-637), although nearby, was not within hydrogen-bonding distance of the AMP phosphate. Similar phosphate contacts to the proximal motif V lysine, but not the distal lysine, have been seen in the Chlorella virus DNA ligase-AMP structure (4).
A Divalent Cation Binding Site-In the course of model building, significant electron density (Ͼ7 in F o Ϫ F c difference maps) was observed between the AMP phosphate and a triad of acidic residues, Asp-483 and Glu-613 of motifs I and IV, respectively, plus Asp-522 from an adjacent LIG protomer in the lattice (Fig. 3A). We surmised that a metal ion was coordinated by the acidic side chains and the AMP phosphate. Because the LIG crystals were grown in the presence of ZnCl 2 , this density was modeled as a Zn 2ϩ ion. To validate this assignment, diffraction data were collected from a LIG crystal at the zinc K absorption edge, and anomalous difference maps were calculated using phases from a model built and refined in the absence of metal. These maps revealed a peak (Ͼ10 above the mean) centered between the phosphate and acidic residues, and unambiguously defined the extra electron density as a Zn 2ϩ ion (depicted as a yellow sphere in Figs. 2A and 3A). These maps also revealed a second zinc atom located at a protomer packing interface, coordinated by Asp-492 and His-493 of one LIG protomer and Glu-727 of another ( Fig. 2A).
Although the second bound metal is of unclear significance, the position and interactions of the first Zn 2ϩ ion are likely to mimic those of the metals that support the strand joining activity of bacterial LigD (these are magnesium, manganese, and cobalt), despite the fact that zinc itself is not an effective cofactor for LigD (29). The motif IV carboxylate side chain that corresponds to the zinc-binding Glu-613 residue of MtuLIG is essential for the activity of all members of the covalent nucleotidyltransferase superfamily (13,17,19,47,49). Moreover, this motif IV carboxylate coordinates a divalent cation in the crystal structures of Chlorella virus DNA ligase-AMP and human Lig1 bound to nicked DNA-adenylylate (4,5).
A Bound Chloride Anion Provides Clues to Nick 5Ј-Phosphate Binding-During refinement, significant difference electron density (Ͼ5 above the mean) was evident at a site near the AMP ribose 3Ј-OH. Given the peak height, neighboring atom identity, and positive charge density in the immediate region, we modeled a chloride anion in this position (depicted as a green sphere in Fig. 3A). In the final refined structure, the chloride is coordinated directly to the ribose O3Ј and the backbone amide of Ala-463, and its negative charge is caged by four basic side chains nearby, Arg-486 (motif I), Arg-501 (motif Ia), Arg-629 (between motifs IV and V), and Lys-635 (motif V) (Fig. 3B). It is remarkable that the recently reported structure of T4 RNA ligase 1 bound to the nucleotide analog AMPCPP also includes a chloride ion coordinated by the ribose 3Ј-OH (50). The Chlorella virus DNA ligase-AMP intermediate contains a sulfate ion at a similar site on the surface of its NT domain; this sulfate is coordinated to the ribose 3Ј-OH, the motif Ia arginine side chain, an arginine located between motifs IV and V, and the proximal motif V lysine side chain (4). Thus, the sulfate in the Chlo-rella virus ligase-AMP structure engages in much the same contacts as does the chloride in the MtuLIG-AMP structure (see the superposition of the active sites in Fig. 3C). It is proposed that the sulfate in the Chlo- rella virus ligase-adenylylate structure mimics the position of the 5Ј-phosphate of the DNA nick prior to the catalysis of DNA-adenylylate formation (step 2) (4). By analogy, we suspect that the chloride anion in MtuLIG is located near the position occupied by the reactive 5Ј-phosphate of its DNA substrate. A superposition to the active site in the structure of human LigI bound to DNA-adenylylate highlights the fact that the chloride in MtuLIG and the sulfate in Chlorella virus ligase are located between the 5Ј and 3Ј DNA termini, albeit closer to the former (Fig. 3C).
Structure-guided Mutation Analysis of MtuLIG-Six of the putative active site residues (Lys-481, Asp-483, Glu-530, Glu-613, Lys-635, and Lys-637) were replaced by alanine in the context of the full-length Mtu-LigD polypeptide. The wild-type MtuLigD and the LigD-Ala mutants were produced in E. coli as His 10 -tagged fusions and were partially purified from soluble bacterial lysates by nickel-agarose chromatography. SDS-PAGE analysis showed that the nickel-agarose preparations were equally enriched with respect to the full-length 97-kDa LigD polypeptide (Fig. 5A). Each of the proteins was tested for strand joining activity using a singly nicked duplex DNA substrate at a 1:1 molar ratio of DNA to full-length LigD polypeptide. Wild-type ligase catalyzed nearly quantitative joining of the 5Ј-32 P-labeled 18-mer strand to the unlabeled 18-mer 3Ј-OH strand at the nick to form a 36-mer product (Fig. 5B). All of the mutants except K635A were inactive in the composite sealing reaction (Fig. 5B). A protein titration showed that the specific activity of the K635A protein was about 20% of the wild-type value (Fig. 5C). The ablation of ligase activity by five of the six alanine mutations was not attributable to global misfolding of LigD, insofar as each of the LigD-Ala mutants retained its DNA polymerase activity (Fig. 6A).
Mutational Effects on the Ligase Adenylylation Reaction-The adenylyltransferase activity of the LigD proteins was assayed by label transfer from [␣-32 P]ATP to the 97-kDa LigD polypeptide to form a covalent enzyme-AMP intermediate (Fig. 6B). The extent of ligase-adenylylate formation by wild-type LigD was proportional to the amount of input protein (Fig. 6B). K635A was as active in autoadenylylation as wild-type LigD (Fig. 6B), consistent with its retention of overall nick-joining function (Fig. 5B). K637A formed about one-fourth the level of ligase-AMP as did an equivalent amount of wild-type LigD. Because the complete loss of function of the K637A mutant in overall nick ligation was out of proportion to the modest decrement in autoadenylylation, we surmise that Lys-637 plays a critical role during one of the downstream steps of the ligation reaction.
The K481A, D483A, E530A, and E613A proteins were apparently inert in the ligase adenylylation reaction (Fig. 6B). Thus, lack of activity of these four mutant proteins in the composite nick-joining reaction can be attributed to their step 1 defects. The adenylylation defect of the K481A mutant was expected, given that Lys-481 is the site of covalent AMP attachment to MtuLIG. The requirement for the motif III Glu-530 side chain was consistent with its crystallographic contacts to the AMP ribose seen here and in structures of other ligases, as well as with mutational studies of various DNA ligases, RNA ligases, and capping enzymes that consistently underscore its essentiality for nucleotidyltransferase activity in vitro and biological activity in vivo (13,16,17,19,49). The lack of step 1 activity upon elimination of motif IV Glu-613 underscores its essential function in binding the divalent cation cofactor required for the adenylyltransferase reaction and is consistent with mutational effects at the equivalent acidic residue of other nucleotidyltransferase family members (13,16,17,19,47,49).
The most surprising result was the stringent requirement for the motif I Asp-483 side chain for the ligase adenylylation reaction. This side chain is conspicuously not required for step 1 adenylylation in the case of Chlorella virus DNA ligase (4,12), human Lig1 (51), Methanobacterium DNA ligase (52), T4 RNA ligase 1 (53), Thermus thermophilus DNA ligase (54), or E. coli LigA (16) but is instead essential during downstream steps of the ligation reactions catalyzed by those enzymes. In the case of RNA capping enzymes, the motif I aspartate, although conserved, is not required for enzyme guanylylation in vitro or the composite RNA capping reaction in vivo (19,55,56). To our knowledge, LigD is the first instance in which the motif I aspartate plays an essential role in the attack of lysine on the NTP substrate. Based on the MtuLIG structure, we infer that it binds the divalent cation cofactor at this step.
Requirements for Phosphodiester Formation at a Pre-adenylylated Nick-The third step of the ligation reaction (phosphodiester formation) can be studied in isolation by assaying the sealing of a pre-adenylylated nicked duplex DNA (57) (Fig. 6C). In bypassing the requirement for steps 1 and 2, we can assess the capacity of step 1-defective or step 2-defective mutants to recognize the nicked DNA-adenylylate and catalyze strand closure. Wild-type LigD and the ligation-competent K635A mutant reacted with this substrate in the absence of ATP to catalyze phosphodiester bond formation, as evinced by the near quantitative conversion of the 5Ј-32 P-labeled adenylylated DNA strand into a 24-mer product (Fig. 6C). The K481A protein was also able to seal the nicked DNA-adenylylate, demonstrating that the motif I lysine nucleophile is not essential for phosphodiester bond formation by LigD. This result agrees with similar findings concerning the motif I lysines of vaccinia, Chlorella virus, and Methanobacterium DNA ligases (12,52,57).
The other alanine mutations either abolished (E530A) or suppressed (D483A, E613A, and K637A) phosphodiester formation (Fig. 6C). The requirement for both of the metal ligands (motif I Asp-483 and motif IV Glu-613) and the ribose ligand (motif III Glu-530) in step 3 catalysis is consistent with previous studies of Chlorella virus DNA ligase, which showed that the motif I Asp, the motif III Glu, and the motif IV Glu enhance the rate of step 3 phosphodiester formation by factors of 60, 1000, and 60, respectively (13). The partial step 3 defect caused by loss of the distal motif V Lys-637 side chain was also consistent with studies showing that the distal motif V lysine contributes an 8-fold enhancement of the rate of step 3 catalysis by Chlorella virus DNA ligase (14). The complete loss of nick sealing activity elicited by the K637A mutation, in the face of partial isolated step 1 and step 3 defects, hints that Lys-637 might also play a key role during step 2 of the ligation pathway.
Role of the LigD Ligase Activity in DSB Repair in Vivo-To query the contributions of the LigD ligase component during NHEJ, we performed an allelic replacement in M. smegmatis, whereby a mutant gene encoding a ligase-defective, polymerase-active MsmLigD protein, K484A, lacking the motif I lysine nucleophile was reintroduced at the ligD locus of a ⌬ligD strain. (Lys-484 in MsmLigD is the equivalent of Lys-481 in MtuLigD.) As a control, we constructed an isogenic strain in which the wild-type ligD gene was reintegrated at the ⌬ligD locus. NHEJ efficiency and fidelity were assayed by transformation with a kanamycin resistance plasmid that was linearized within a lacZ reporter gene by digestion with either EcoRV, which generates a blunt-ended DSB, or Asp7181, which generates a DSB with complementary 4-nucleotide 5Ј-overhangs (Fig. 7A). Successful transformation of M. smegmatis to kanamycin resistance by the linear plasmids depends on ligation of the ends to produce a circular DNA molecule. Faithful resealing of the original ends with no gain or loss of nucleotides will restore the lacZ reading frame and result in blue colony color on agar medium containing 5-bromo-4chloro-3-indolyl-␤-D-galactopyranoside (X-gal). However, unfaithful sealing after insertion or deletion of nucleotides will usually result in a frameshift mutation that inactivates the lacZ gene and elicits a white colony phenotype (31). The fidelity of NHEJ is defined as the percent of transformants that have blue colony color.
NHEJ in M. smegmatis reconstituted with wild-type LigD was characteristically error-prone (49% fidelity for the 5Ј-overhang DSB and 46% fidelity for the blunt-end DSB). In the ligD-K484A strain, the fidelity of the 5Ј-overhang and blunt-end NHEJ decreased to 41 and 18%, respectively (Fig. 7B). This effect was starkly counter to the increased fidelity (96% for 5Ј-overhang repair and 92% for blunt-end NHEJ) seen in the ⌬ligD strain (Fig. 7B). Whereas the elimination of the LigD protein resulted in a 10-fold decrement in the efficiency of 5Ј-overhang repair and a 20-fold decrement in the efficiency of blunt-end DSB repair, the active site mutation of the LIG domain reduced 5Ј-overhang and bluntend NHEJ efficiency to approximately one-half and one-third of the wild-type levels, respectively (Fig. 7C). These results indicate the following: (i) the strand joining capacity of LigD accounts for only part of its DSB repair-promoting activity in vivo; (ii) at least one of the other strand-joining enzymes in mycobacterium (LigA, LigB, or LigC) can provide a backup when LigD is unable to seal the DSB ends; and (iii) it is not the sealing activity of LigD that is responsible for the mutagenic quality of mycobacterial NHEJ. Indeed, the sealing activity of LigD exerts an overall positive effect on the fidelity of DSB repair, as surmised from the fact that selective ablation of the LIG function causes reduced fidelity.
We determined the nucleotide sequences of rejoined EcoRV-cut plasmids from 20 independent lacZ Ϫ transformants of the wild-type LigD strain and 29 lacZ Ϫ transformants of the ligase-defective K484A strains (Fig. 8). In wild-type cells, 60% (12/20) of the unfaithful junctions contained nontemplated single nucleotide insertions, 10 of which were added between the otherwise unperturbed EcoRV ends and 2 of which occurred at one intact EcoRV end that was joined to a deleted terminus. Deletion at one or both ends occurred in 50% (10/20) of the junctions in wild-type cells (Fig. 8). In contrast, only 7/29 (24%) of the imprecise NHEJ junctions recovered from the K484A strain entailed nontemplated single nucleotide insertions, whereas 27/29 (93%) involved deletions at one or both termini (Fig. 8). Thus, the major effects of eliminat- A, map of the NHEJ reporter plasmid, which contains a mycobacterial origin of replication (OriM), a kanamycin resistance gene (Kan), and a lacZ gene, within which are unique restriction sites for endonucleases EcoRV, which generates the blunt DSB end, and Asp7181, which leaves a 4-nucleotide 5Ј-overhang DSB end. B, NHEJ was assayed as described (31) by transformation of wild-type (WT) M. smegmatis, the LIG-defective ligD-K484A strain, and the ⌬ligD null strain with the reporter plasmid linearized with Asp7181 or EcoRV. NHEJ fidelity (% blue transformants) is plotted on the y axis. The x axis indicates the ligD genotype. C, y axis is NHEJ efficiency normalized to the wild-type value (100%). The ligD genotype is indicated on the x axis. (For simplicity, we depict the microhomologies as resulting from 3Ј end-trimming and exposure of complementary 5Ј extensions; alternatively, they might arise from 5Ј exonuclease trimming with exposure of complementary 3Ј single-strand tails.) The number (x) of independently isolated identical junctions is indicated to the right (e.g. by 4 or 2ϫ, respectively). One of the junctions formed in the LIG-defective strain occurred after nontemplated single nucleotide additions and an inversion of a 152-nucleotide (nt) segment from the right-hand EcoRV end. ing LigD ligase activity on the outcomes of mutagenic NHEJ at a blunt DSB end were to increase the occurrence of large deletions and diminish the frequency of small, nontemplated insertions.
We conducted a similar analysis of the junction sequences of unfaithfully rejoined Asp7181-cut plasmids from 24 lacZ Ϫ transformants of the wild-type LigD strain and 20 lacZ Ϫ transformants of the ligase-defective K484A strain (Fig. 9). In wild-type cells, 19/24 events (79%) entailed templated fill-in synthesis at one or both DNA ends; the most common event was a 4-nucleotide fill-in at both Asp7181 overhangs followed by blunt end-joining, which was recovered in eight independent transformants. In one event, a single nontemplated nucleotide was added to the filled-in blunt end. Deletions at one or both ends occurred in half of the junctions in wild-type cells (12/24) (Fig. 9). The outcomes were strikingly different at Asp7181 ends in the K484A strain, where only 1/20 events (5%) entailed templated fill-in synthesis at just one end and deletion at the other end, whereas the remaining 19 events involved deletions at both ends (Fig. 9). The net effect of losing the LIG activity of LigD was to skew repair of a complementary 5Ј-overhang toward deletion formation and away from insertions, to an extent that the former occurred in 100% of the events scored (up from 50% in wild-type cells), whereas the latter events were reduced in relative terms by a factor of 16 (5% inserts in K484A versus 79% in wild-type cells).

DISCUSSION
Mechanistic Insights from the MtuLIG Structure-The crystal structure of MtuLIG provides the first view of a bacterial cellular ATP-dependent DNA ligase. As expected, MtuLIG is composed of NT and OB domains that are similar to the NT and OB modules found in other DNA ligases and RNA capping enzymes. The AMP-binding pocket of MtuLIG is composed of nucleotidyltransferase motifs I, Ia, III, IIIa, IV, and V, also as expected. MtuLIG apparently comprises a bacterial version of the "minimal" catalytic unit exemplified by Chlorella virus DNA ligase. However, the mycobacterial ligase is clearly not as functionally robust as Chlorella virus ligase, because the latter protein is an efficient nick-sealing enzyme in vitro and is fully competent to sustain all essential ligase functions in vivo in yeast, whereas MtuLigD is an inefficient nick-joining enzyme in vitro and cannot sustain cell viability on its own, either in yeast or in Mycobacterium (15,30,31).
An interesting similarity between MtuLIG and Chlorella virus ligase is that they both have a disordered loop between the first and second ␤-strands of their OB domains. The 23-amino acid loop in Chlorella virus ligase is rich in basic amino acids, is protease-sensitive in the free enzyme, and becomes protease-resistant upon DNA binding (4,58). The MtuLIG loop is shorter (8 amino acids), rich in glycine, and has only one basic side chain. Modeling of the Chlorella virus ligase OB domain onto the structure of DNA-bound human LigI suggests that the disordered loop might form a clamp around the DNA duplex entailing polar contacts to the backbone phosphates, which could account, in part, for the tight binding of the viral ligase to nicked DNA substrates (12,58). The different length and composition of the loop in MtuLIG might account for its poor affinity for nicked DNA, exemplified by the ease with which LigD dissociates from the substrate during the ligation reaction in vitro (30).
The present structural and functional analysis yielded additional clues to the ligase mechanism in several respects. A metal-coordination complex consisting of the AMP phosphate, the motif I Asp, and the motif IV Glu was revealed. The motif IV Glu was required for steps 1 and 3, consistent with analogous studies of other enzymes. The new twist was that the metal-binding motif I Asp was essential for the ligase adenylylation reaction, a property that distinguishes MtuLIG from other nucleotidyltransferase family members. A chloride anion was bound in the active site of MtuLIG in a position near that of a sulfate anion in Chlorella virus AMP ligase and the 5Ј-phosphate nick in the DNA-bound LigI structure. A surrounding cage of basic amino acids in MtuLIG is conserved in Chlorella virus ligase, arguing that a common structural solution (a variant of an oxyanion hole) has emerged to recognize the nick 5Ј-phosphate. The fact that the chloride in MtuLIG is coordinated by the ribose hydroxyl of AMP lends credence to the generality of the proposal for substrate-assisted catalysis of step 2 (4, 59), as a means to ensure that only the adenylylated form of ligase will bind to the nick.
NHEJ Outcome Reflects a Balance between End Remodeling and End Joining Activities-It was surprising to us that selective ablation of the strand joining activity of LigD would cause such a strong bias toward deletions and against insertions, given that the mutation of the LIG active site has no apparent impact on the POL activity of the mutant LigD protein in vitro (Fig. 6A) and the recent genetic evidence implicating the POL component of LigD as the immediate catalyst of nontemplated nucleotide insertion during blunt-end NHEJ in vivo (38). Several questions thus arise as follows. (i) How is ligation occurring during NHEJ in the K484A strain? (ii) Why are insertions so under-represented at the repair junctions in the K484A strain?
It is obvious that the sealing reactions at the plasmid termini in K484A cells cannot have been performed in toto by LigD, because the K484A mutation completely abolishes the essential ligase adenylylation step and, perforce, the subsequent DNA adenylylation step, which requires ligase-AMP formation. To account for the fact that NHEJ occurs in K484A cells with only modestly reduced efficiency, we must invoke one of two scenarios as follows: (i) LigA, LigB, or LigC can perform the entire end-joining reaction in lieu of the LigD-K484A protein; or (ii) one of the aforementioned ligases is able to form a DNA-adenylylate at the junction termini that can then be converted into a sealed phosphodiester by the LigD-K484A mutant. The present experiments do not discriminate in favor or against either model, although our biochemical analysis of the LigD motif I lysine mutant clearly establishes its competence to execute step 3 if another ligase were to perform the first two steps of the ligation pathway. LigC is a plausible candidate to substitute for all or part of the LIG function for LigD, given our previous genetic evidence that LigC provides a backup pathway of Ku-dependent NHEJ in vivo in a ⌬ligD strain, albeit one of relatively low efficiency (31). It is noteworthy that LigC displays weak ATP-dependent nick joining activity in vitro and generates high levels of the DNA-adenylylate intermediate (30). Thus, the second scenario above, in which LigC generates DNA-adenylylate that is handed off for sealing to LigD-K484A, is not far fetched. These ideas can be tested in future studies as follows: (i) by creating more complex mutants of M. smegmatis that contain the ligD-K484A allele plus ⌬ligC, ⌬ligB, or ⌬ligBC knock-out alleles; (ii) by performing allelic exchanges with other LigD mutations that block all steps of the ligation reaction pathway, e.g. the E530A and E613A mutants characterized here.
The reduced frequency of insertion junctions during NHEJ in the K484A strain can be rationalized as follows: (i) insertion events are somehow not occurring, even though the POL activity of LigD is unperturbed by the K484A lesion; or (ii) insertions do occur, but they are subsequently eliminated by an active process of terminal deletion. We favor the latter alternative in the context of the following model engendered by comparing the effects of selectively inactivating the POL and LIG functions of LigD. Although both maneuvers result in modest reductions in NHEJ efficiency, the POL and LIG lesions exert opposite effects on the fidelity of blunt DSB repair. In a POL-defective strain, fidelity increases to 92%, which phenocopies the increased fidelity of the ⌬ligD strain (38). The LIG-defective strain has reduced fidelity, whereby 82% of the blunt DSB repair events are mutagenic. Nonetheless, when the molecular outcomes of mutagenic blunt DSB repair are examined, both the POL-defective and the LIG-defective strains are strongly biased toward deletions. This is sensible for the POL-defective strain, insofar as the data indicate that LigD POL is the agent of the ϩ1 frameshifts that characterize blunt DSB repair in wild-type mycobacteria (38). The presumption is that junctions that have undergone nucleotide insertions in wild-type cells are subsequently sealed by the LIG component of LigD.
The junction sequence analysis highlights that wild-type mycobacteria have a parallel pathway whereby the DSB ends are resected prior to sealing, thereby resulting in deletions of varying sizes from one or both of the input DSB termini. The identity of the nuclease(s) responsible for the end resections in vivo is not known. It appears to be the case that the net outcome of NHEJ is dictated by a kinetic balance between: (i) sealing with no end-remodeling, which occurs about half the time in wild-type mycobacteria; (ii) sealing subsequent to insertion; and (iii) sealing subsequent to deletion. We posit that when ligation is less effective or is temporally delayed, the kinetic balance is perturbed so that many of the ends that have undergone nucleotide additions (be they templated or nontemplated) are subject to resection by nucleases before eventually being sealed. We invoke this scenario to explain the increased prevalence of deletions in the K484A strain. A critical next step will be to identify genetically the relevant deletion-promoting nuclease(s) and perturb their function in vivo. Once this is accomplished, we will be able to assess whether the skew toward deletions and away from insertions can be ameliorated when the nucleases are disabled.
The LIG-defective LigD strain described here and the POL-defective strain described elsewhere (38) provide potentially useful genetic tools to assess the significance of NHEJ to mycobacterial physiology. Of particular interest will be to determine how NHEJ influences the susceptibility of mycobacteria to a spectrum of DNA-damaging agents, the rates and molecular spectra of spontaneous and damage-induced mutations, the emergence of drug resistance, and ultimately, the pathogenesis and persistence of tuberculosis infection.