![]()
|
|
||||||||
J. Biol. Chem., Vol. 279, Issue 16, 16471-16478, April 16, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

¶
||
||
**
**
From the
School of Biological Sciences,
AgResearch Structural Biology Laboratory and **Centre of Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland, New Zealand
Received for publication, December 18, 2003 , and in revised form, January 15, 2004.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
As part of a pilot structural genomics project aimed at the discovery of biological function, we have focused on gene products from the hyperthermophilic crenarchaeon Pyrobaculum aerophilum, an organism whose complete genome sequence was published recently (7). A whole-genome comparison of P. aerophilum and Mycobacterium tuberculosis, two organisms with very different, and in a sense extreme, lifestyles, led us to identify a set of 250 pairs of orthologous genes that are both widely distributed in nature and are shared by these two organisms. Among these were a set of four genes from P. aerophilum (PAE0151, PAE0285, PAE0337, and PAE2754) and four from M. tuberculosis (Rv0065, Rv0549, Rv0960, and Rv1720) that have since been clustered at NCBI as part of COG4113, with members drawn from Archaea, cyanobacteria, actinobacteria and
-proteobacteria (Table I). These are now annotated in Pfam as PIN domains.
|
In Archaea and thermophilic bacteria, PIN domains have been associated with a possible role in DNA repair. A recent analysis of conserved gene context across fully sequenced prokaryotic genomes revealed, in most Archaea and some thermophilic bacteria, a previously unrecognized cluster of genes containing DNA polymerases, helicases, nucleases, and many conserved hypothetical open reading frames, one of which is a PIN domain clustering in COG1848 (11). This suggested a new DNA repair system in these organisms. DNA repair, and particularly mismatch repair, in thermophiles is a vexing question. The absence of key mismatch repair enzymes such as MutS and MutL, which are highly conserved in mesophiles from Escherichia coli to humans, has been suggested to result in "mutator" lifestyles for some thermophiles (7), in which adaptive mutations enable the organism to adapt to stress or extreme environments (12). Also absent from many of the fully sequenced thermophilic genomes are several well conserved nucleotide excision repair enzymes. The discovery of a new DNA repair operon in Archaea addresses this question and by implicating PIN domains as part of this operon adds another piece of functional evidence for this large protein family.
Here we present the first crystal structure of a PIN domain, from the crenarchaeon P. aerophilum. The protein, the gene product of open reading frame PAE2754, proves to be a distant structural homologue of T4 RNase H and other exonucleases, despite insignificant sequence identity. Strict conservation of the active site residues suggests that this PIN domain is indeed an exonuclease. We have confirmed this functional hypothesis in vitro. This has important implications both for archaeal DNA editing and for the role of PIN domains in eukaryotic RNA editing. It is also an illustration of the power of structural genomics whereby deep phylogenetic lineages are apparent at the structural level and lead directly to functional characterization of proteins of previously unknown function or with equivocal and/or general functional annotation.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Two site-specific mutations, L65M and L80M, were designed and introduced to facilitate structure determination by multiwavelength anomalous diffraction (MAD) methods using the selenomethionine (SeMet)-substituted protein. The single mutants were individually made and tested for expression and crystallization, followed by the double mutant L65M/L80M. Mutagenesis was performed with the QuikChange site-directed mutagenesis kit (Stratagene). The double mutant was stable at 80 °C suggesting that the structure was not significantly destabilized by these mutations. The plasmid encoding the double mutant was transformed into the methionine auxotroph E. coli strain DL41(DE3) and grown in LeMaster medium with SeMet as the only methionine source. The SeMet-substituted double mutant protein (SeMet-PAE2754_MM) was then purified as above.
Both native PAE2754 and SeMet-PAE2754_MM were crystallized as described (13) and flash-cooled for data collection by soaking in cryoprotectant (mother liquor plus 10% glycerol) immediately prior to placement in a stream of cold N2 gas at 110 K.
Structure Determination and RefinementNative PAE2754 x-ray diffraction data to 2.5-Å resolution were collected at the National Synchrotron Light Source, Brookhaven, on beamline X8C (
= 1.0000 Å). MAD data at two wavelengths were collected for SeMet-PAE2754_MM at the Stanford Synchrotron Radiation Laboratory, beamline 9-1. The data were processed using DENZO and SCALEPACK (14). Data collection and refinement statistics are given in Table II. The structure of PAE2754_MM was determined by single anomalous diffraction using a single SeMet data set at
= 0.9794 Å. This is not where the anomalous differences are maximized for selenium, but this was necessary because the "remote" wavelength data set proved to be of poor quality due to crystal decay (data were initially collected at two wavelengths in accordance with Ref. 15). Using SOLVE (16, 17), a total of 17 out of a possible 24 selenium sites (from the 8 monomers in the crystal asymmetric unit) were located, based on anomalous differences. These gave initial phases to 2.8 Å with a figure of merit of 0.18 and a Z-score of 20.6. The phases were improved using maximum likelihood density modification via RESOLVE (16). Successful improvement of the phases was dependent on user-defined non-crystallographic symmetry elements based on the initial selenium positions.
|
74 residues per monomer) was built automatically with RESOLVE (16) and TEXTAL (18, 19), and the rest of the structure was built manually, during 12 cycles of model building with O (20) and refinement with CNS (21). Due to the relatively low resolution of the data, non-crystallographic symmetry was included as a restraint in all refinement cycles, and weighted experimental phases from RESOLVE were also included at all stages. The final PAE2754_MM structure was then used as a molecular replacement model for the native data, and this structure was completed using 5 cycles of model building and refinement. Refinement statistics for both models are given in Table II. In Vitro Exonuclease AssaysAn 18-bp primer (5'-CGCGCCGTTGCTATCTCC-3') was annealed to a 54-bp primer (5'-ATTGAGAAATTCACGGCGNNKATANNKNNKGTTNNKGGAGATAGCAACGGCGCG-3'; where N = T, G, A, C; and K = T, G) to form double-stranded DNA with a 36-bp, 5'3' single-stranded, randomized overhang. 200 pM DNA was then mixed with 200 pM PAE2754 in 20 mM NaCl and 10 mM MgCl2 (MgCl2 was omitted from the negative control) and incubated at 37 °C. The reaction was stopped at different time points by the addition of formamide gel loading buffer (80% v/v formamide, 10 mM EDTA), followed by freezing at -20 °C. Samples were run on a 20% polyacrylamide-urea denaturing minigel and visualized using ethidium bromide staining. Samples were also prepared in the same way using MnCl2 as the metal ion source.
| RESULTS |
|---|
|
|
|---|
Met mutations, that was constructed to facilitate phasing by MAD methods. This derivative structure was then refined to give a final R-factor of 0.226 (Rfree = 0.279). The native structure was then solved by molecular replacement and refined at 2.5 Å resolution to a final R-factor of 0.250 (Rfree = 0.305). The resulting model has good stereochemistry (Table II) with 92% of residues in the most favored region of the Ramachandran plot.
The PAE2754 monomer forms a single domain in which the 133-residue polypeptide is folded as an
/
/
stack, with a central twisted parallel
-sheet of five short strands (Fig. 1). The strand order is 32145, and the twist of the sheet is such that the outer strands, which involve just three residues each, are oriented at 160° with respect to each other. Between strand 2 and strand 3, helices
2 and
3 pack in an antiparallel manner to form a long protrusion that extends orthogonally from the
/
/
stack. Hydrophobic cores above and below the central
-sheet stabilize the
/
/
stack, and a third hydrophobic mini-core is formed below the stack by the orthogonal packing of helices
4 and
5. These hydrophobic cores are highly populated by Ala, Leu, and Val residues, which compose no less than 40% of the PAE2754 sequence.
|
2,
3, and
4. This interface buries 1440 Å2 of surface area (19% of the total monomer surface) and is dominated by hydrophobic interactions, marked by a striking interdigitation of many large hydrophobic side chains. There are just six hydrogen bonds between the two protein chains, all centered around the stacked histidine aromatic rings at the center of the interface.
|
2 from one monomer contacts the N terminus of helix
6 from another. Stabilizing interactions at this interface are modest and include a salt bridge between Arg-48 from one monomer and Asp-110 and Glu-112 from another. The backbone carbonyl group from Arg-48 also hydrogen-bonds across the interface to the amide group of Arg-111. Although only 450 Å2 of surface area per monomer is buried on formation of this interface, the cooperative association of two dimers buries in total 4 x 450 = 1800 Å2. There is also a diagonally stabilizing interaction across the tetramer whereby a chloride ion linearly coordinates two arginine residues. The chloride lies at the center of a sphere of charged side chains that include Arg-48 (chain A), Glu-112 and Lys-116 (chain B), Glu-112 and Lys-116 (chain C), and Arg-48 (chain D).
Residues that are conserved across COG4113 (Figs. 2 and 3) are clustered in a pocket formed at the C-terminal end of the
-sheet and the N termini of helices
2 and
6. This arrangement brings together four conserved acidic residues (Asp-8, Glu-38, Asp-92, and Asp-110 in PAE2754) that point into the pocket and create a highly negatively charged hole. Two other conserved residues on either side of Asp-110, Thr-108, and Leu-112, also flank the acidic pocket with Thr-108 being hydrogen-bonded to Asp-8. In the PAE2754 dimer (Fig. 2), the two acidic pockets are
20 Å apart and are separated by an intriguing structure formed by the one remaining fully conserved residue, Tyr-91; the two Tyr-91 side chains lie adjacent at the dimer interface, with their aromatic rings parallel and 6 Å apart. Upon formation of the tetramer, the four active site pockets lie in the interior of a tunnel with restricted access via two openings on opposite sides of the tetramer (Fig. 2). Adjacent lysine and glutamic acid residues (Lys-45 and Glu-46) from each monomer flank the entrances to the tunnel.
|
positions, and 8% sequence identity. The topological match between PAE2754 and DAO in the overlaid region was relatively good, with matches for all the major elements of secondary structure in PAE2754 except
4, although DAO does have two large insertions of 90 amino acids (between
2 and
3) and 110 amino acids (between
4 and
5). DAO has an FAD cofactor whose nucleotide component binds into the hole where the active site is hypothesized for PAE2754. This appeared to support the PIN domain annotation of "possible nucleotide-binding protein" from the major data bases and was also consistent with the RNase hypothesis of Clissold and Ponting (10). A second match to the ADP binding domain of trimethylamine dehydrogenase (24), where the nucleotide was similarly orientated, added weight to this hypothesis.
Perhaps more significant, however, was the presence in the top 10 DALI structural matches of the T4 RNase H structure (25). This also has low DALI scores (Z = 2.8, r.m.s.d. = 3.6 Å over 84 amino acids and 10% sequence identity), and the topological matches between PAE2754 and T4 RNase H were significantly poorer than for DAO, with no matches for
1,
2,
4, or
7 of PAE2754. What was striking, however, was the observation that residues that are conserved across COG4113, including the acidic residues at the putative active site, aligned structurally with similar residues in T4 RNase H that are involved in Mg2+ binding and catalysis (Fig. 4). Furthermore, these residues are also conserved across a large family of related prokaryotic exonucleases. This led us to test Mg2+ binding and DNase activity in vitro.
|
In Vitro Tests for Exonuclease ActivityExonuclease assays were carried out using synthetic DNA primers designed to give a long 5'3' single-stranded overhang. A time course incubation clearly shows PAE2754 has a Mg2+-dependent exonuclease activity (see Fig. 5). This experiment was repeated with Mn2+ in place of Mg2+ with equivalent results (data not shown), but no activity was seen in the absence of a suitable divalent cation. These initial tests show that the cleavage of single-stranded DNA by PAE2754 is slow and requires equimolar amounts of DNA, Mg2+, or Mn2+ and protein to provide catalysis. The sluggish reaction may be the result of the non-optimal substrate and/or the non-optimal temperature of the assay; we presume that the optimal temperature for this enzyme is 95100 °C. Assays to determine substrate specificity and the optimal temperature are the subject of ongoing work.
|
1 and
6, respectively, and their carboxylate groups are fixed in place by typical helix N-cap hydrogen bonds (with Ala-11 NH and Tyr-113 NH). This is reminiscent of the first Mg2+ site in the Pyrococcus furiosus FEN-1 endo/exonuclease (29), where both the Asp residues that directly coordinate the Mg2+ ion are fixed at helix N termini. It seems likely, by analogy, that the Mg2+ site in PAE2754 may be similarly pre-organized, with Asp-8 and Asp-110 directly coordinating the metal. Asp-92 could also coordinate a metal ion bound in this way, either directly or indirectly via a water molecule, but Glu-38 is more remote (6 Å away). If two Mg2+ ions are bound, as is the case in T4 ribonuclease H, the flap exonucleases, and many other exo- and endonucleases and polymerases, it is likely that Glu-38 would participate in binding the second Mg2+ ion. If this is so, the distance between the two Mg2+ ions will be somewhere between the
4 Å seen in the Klenow fragment of DNA polymerase (31) and the
8 Å seen in the T5 5'-exonuclease (32).
The invariant threonine residue, Thr-108, is a candidate for involvement in catalysis but is fairly well buried, hydrogen-bonded to Asp-8 O
-2 and Asp-110 NH, and it is difficult to see how it can play any direct role. Two other hydroxyl-containing residues, Ser-10 and Thr-89, which are almost fully conserved, are also adjacent to the metal site where they are fully exposed in the central tunnel and could play a role in catalysis or binding. Two other features of the active site region seem likely to be important. First, the pairwise, parallel, stacking (Fig. 2) of the aromatic rings of the conserved Tyr-91 residues on the inner surface of the central tunnel suggests that these residues could participate in substrate binding by stacking on either side of a nucleotide base. Aromatic residues have been found previously (33, 34) to perform such a function in, for example, single-stranded RNA and DNA binding domains. Second, side chains of Lys-45 from each monomer project into the tunnel and could be involved in binding to nucleic acid phosphate groups. This residue is almost fully conserved as Lys or Arg.
The active site is only accessible from inside the tunnel through the tetramer, implying that nucleic acid substrates must thread through it. The diameter of the opening to the tunnel is an oval with dimensions of
10 x 14 Å, too small for double-stranded DNA or RNA, but consistent with the protein cleaving overhanging single-stranded nucleic acids or flap structures, as is the case for flap endonucleases.
| DISCUSSION |
|---|
|
|
|---|
135 sequence positions are conserved across the domain. Our structure of PAE2754 places these conserved residues close together in three-dimensional space. Together with our experimental evidence for nuclease activity, and the demonstrated structural homology with known exonucleases such as T4 ribonuclease H and the flap exonucleases, it further provides compelling evidence that PIN domains do indeed play a role in DNA and/or RNA editing processes through Mg2+-dependent exonuclease activity.
Multiple sequence alignments of PIN domains based on COGs or Pfam show the following three principal features: (i) three conserved aspartic acid residues, one near the N terminus of the protein and two clustered near the C terminus; (ii) a conserved threonine or serine adjacent to the last conserved aspartic acid, forming a (T/S)XD motif in which the threonine or serine possibly plays a catalytic role; (iii) a well conserved acidic residue (either Asp or Glu) in the center of the sequence. The acidic residues are clustered in such a way that they could support the coordination of either one or two Mg2+ ions. If two Mg2+ ions are bound, as is commonly the case in polymerases and nucleases (25), and has been proposed as essential for catalysis (31), it is probable that one will be bound to Asp-8, Asp-92, and Asp-110, which are in close proximity, and the second to Glu-38. The former site may be of higher affinity given that the side chains of Asp-8 and Asp-110 are fixed at the N termini of
-helices. On the other hand, a direct and essential role for Thr-108 in catalysis seems less likely, given its relatively buried location.
The ways in which PIN domains associate into oligomers, or are combined with other domains or other proteins, are likely to determine the types of editing in which they are involved and the types of substrates on which they act. Thus, the PAE2754 structure is tetrameric, both in solution (as shown by dynamic light scattering and gel filtration) and in the crystal, and the tunnel through the center of the tetramer provides quite restricted access to the active site. Our preliminary modeling suggested that only single-stranded nucleic acids, or single-stranded overhangs from duplex structures, could serve as substrates. This hypothesis is supported by our in vitro assays, using double-stranded DNA with a single-stranded overhang as a template, although we have not demonstrated any specificity in either substrate or the direction of DNA cleavage, and further functional studies clearly need to be carried out. We also note that the FEN1 endonuclease from Pyrococcus horikoshii forms a dimer that is topologically similar to the PAE2754 tetramer, with the two active sites inside the dimer and access via a hole to the exterior.
In contrast, a second example of what is clearly a PIN domain structure has recently been deposited in the Protein Data Bank (ID code 1O4W [PDB] , deposited by the Joint Centre for Structural Genomics (San Diego)). This protein shares 23% sequence identity with PAE2754 and forms a very similar monomer in which 101 residues match with an r.m.s.d. of 3.0 Å. This Archaeoglobus fulgidus PIN domain forms a dimer in which the monomers are tethered by a stretch of 10 amino acids at the C terminus of the protein, thus separating the monomer active sites by 42 Å and creating a very different molecular environment from that of the four active sites in the PAE2754 tetramer.
An intriguing feature of the phylogenetic distribution of PIN domains is that they seem to be amplified in a number of species, sometimes to a remarkable extent. These include A. fulgidus, P. horikoshii, and Methanococcus jannaschii, all of which are thermophilic euryarchaeota. Among mesophilic bacteria, PIN domains also seem to be extraordinarily amplified in M. tuberculosis. COG1848, proteins from which have been predicted to be part of a new DNA repair system (11), includes no fewer than 14 PIN domain proteins from M. tuberculosis. Additionally, three other COGs are found to include a further 12 M. tuberculosis PIN domain proteins between them. This presents the intriguing question as to why M. tuberculosis would have 26 PIN domains as part of its DNA or RNA editing machinery.
This augmentation of the exonuclease family and the accumulation of PIN domain proteins in certain species raise intriguing questions as to the cellular function of the PIN domains. In thermophilic Archaea, it is reasonable to expect that there would be additional suites of DNA and RNA editing and repair mechanisms due to the elevated levels of oligonucleotide modification and damage caused by the high temperatures. Indeed, a predicted new DNA repair operon encompassing
20 genes in Archaea contains a PIN domain protein (11, 12). However, the augmentation of the exonucleases in mesophiles is unexpected. M. tuberculosis is the most extreme example of the currently sequenced organisms, with its 26 PIN domain proteins, all of which contain the acidic quartet of Mg2+-binding residues and the adjacent serine or threonine. This suggests that a large retinue of exo- or endonuclease enzymes are present in this pathogenic bacterium.
There has been speculation about the functional relevance of the presence or absence of DNA repair genes in M. tuberculosis. It has been suggested that an absence of repair enzymes might be beneficial under conditions of stress or therapeutic treatment during long periods of stationary phase as is the case for M tuberculosis (36, 37). On the other hand, the correlation between those species lacking the mutS gene and thus assumed to be mismatch repair-deficient and the expansion, in the same species, of the PIN domain proteins may not be coincidental. We conclude that these PIN domain proteins are very likely substitutes for a number of exo- and endonucleases, with roles in DNA repair that are apparently missing from the M. tuberculosis genome and from the genomes of various extremophile species (12).
The second possible role for these nucleases is as a defense arsenal designed to neutralize phage via DNA and/or RNA degradation. It is the case that these functions have been adapted in eukaryotic PIN domains to degrade RNA via the RNAi and NMD pathways (10). Hence, in the eukaryotes it seems that editing and repair functions for PIN domains have likely been adapted to degradation pathways.
| FOOTNOTES |
|---|
* This work was supported by funding from the Marsden Fund and the Health Research Council of New Zealand. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
|| Present address: Dept. of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, Sweden. ![]()
¶ To whom correspondence should be addressed. Tel.: 64-9-373-7599; Fax: 64-9-373-7414; E-mail: v.arcus{at}auckland.ac.nz.
1 The abbreviations used are: NMD, nonsense-mediated degradation; COG, cluster of orthologous groups; PAE2754, protein encoded by open reading frame number 2754 from P. aerophilum; MAD multiwavelength anomalous diffraction; SeMet, seleno-methionine; DAO, D-amino acid oxidase; r.m.s.d., root mean square difference. ![]()
| ACKNOWLEDGMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. D. Magnuson Hypothetical Functions of Toxin-Antitoxin Systems J. Bacteriol., September 1, 2007; 189(17): 6089 - 6092. [Full Text] [PDF] |
||||
![]() |
K. Mattison, J. S. Wilbur, M. So, and R. G. Brennan Structure of FitAB from Neisseria gonorrhoeae Bound to DNA Reveals a Tetramer of Toxin-Antitoxin Heterodimers Containing Pin Domains and Ribbon-Helix-Helix Motifs J. Biol. Chem., December 8, 2006; 281(49): 37942 - 37951. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rother, E. Clausing, A. Kieser, and K. Strasser Swt1, a Novel Yeast Protein, Functions in Transcription J. Biol. Chem., December 1, 2006; 281(48): 36518 - 36525. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Bleichert, S. Granneman, Y. N. Osheim, A. L. Beyer, and S. J. Baserga The PINc domain protein Utp24, a putative nuclease, is required for the early cleavage steps in 18S rRNA maturation PNAS, June 20, 2006; 103(25): 9464 - 9469. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Tachdjian and R. M. Kelly Dynamic Metabolic Adjustments and Genome Plasticity Are Implicated in the Heat Shock Response of the Extremely Thermoacidophilic Archaeon Sulfolobus solfataricus. J. Bacteriol., June 1, 2006; 188(12): 4553 - 4559. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Granneman, M. R. Nandineni, and S. J. Baserga The Putative NTPase Fap7 Mediates Cytoplasmic 20S Pre-rRNA Processing through a Direct Interaction with Rps14 Mol. Cell. Biol., December 1, 2005; 25(23): 10352 - 10364. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. P. Pandey and K. Gerdes Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes Nucleic Acids Res., February 17, 2005; 33(3): 966 - 976. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. FATICA, D. TOLLERVEY, and M. DLAKIC PIN domain of Nob1p is required for D-site cleavage in 20S pre-rRNA RNA, November 18, 2004; 10(11): 1698 - 1701. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||