Purification of the novel endonuclease, Hpy188I, and cloning of its restriction-modification genes reveal evidence of its horizontal transfer to the Helicobacter pylori genome.

We have isolated a novel restriction endonuclease, Hpy188I, from Helicobacter pylori strain J188. Hpy188I recognizes the unique sequence, TCNGA, and cleaves the DNA between nucleotides N and G in its recognition sequence to generate a one-base 3' overhang. Cloning and sequence analysis of the Hpy188I modification gene in strain J188 reveal that hpy188IM has a 1299-base pair (bp) open reading frame (ORF) encoding a 432-amino acid product. The predicted protein sequence of M.Hpy188I contains conserved motifs typical of aminomethyltransferases, and Western blotting indicates that it is an N-6 adenine methyltransferase. Downstream of hpy188IM is a 513-bp ORF encoding a 170-amino acid product, that has a 41-bp overlap with hpy188IM. The predicted protein sequence from this ORF matches the amino acid sequence obtained from purified Hpy188I, indicating that it encodes the endonuclease. The Hpy188I R-M genes are not present in either strain of H. pylori that has been completely sequenced but are found in two of 11 H. pylori strains tested. The significantly lower G + C content of the Hpy188I R-M genes implies that they have been introduced relatively recently during the evolution of the H. pylori genome.

Restriction-modification systems were first recognized in Escherichia coli more than four decades ago (1, 2) because of their role as enzymatic barriers against genomic invasion by phages. The restriction endonuclease recognizes a specific sequence in DNA, and cleaves the DNA, whereas the cognate methyltransferase modifies DNA at the same recognition sequence, preventing cleavage by the endonuclease. Based on subunit composition, co-factor requirements, DNA specificity characteristics, and reaction products, R-M systems may be classified as type I, type II, or type III (3). Type II R-M systems have the simplest architecture, usually consisting of two sepa-rate enzymes, a restriction endonuclease and a methyltransferase (3). They play an indispensable role in the manipulation of recombinant DNA, and serve as models for study of protein structures (4,5), catalytic mechanisms (6,7), and DNA-protein interactions (7)(8)(9).
Helicobacter pylori is one of few bacteria that can colonize the human stomach (10,11). The colonization increases the risk of developing ulcer disease and gastric adenocarcinoma (12). Analysis of the entire genomic sequence of H. pylori strain 26695 and J99 predicted that these strains have 14 or 15 potential type II R-M systems (13,14). Comparison of the two strains demonstrated that the genomes are quite similar, with only 6 -7% strain-specific genes (14). Diverse R-M systems comprise a large portion of the strain-specific genes. Despite their potential importance in H. pylori, few of these R-M systems have been described in detail. In this study, we purified a novel restriction endonuclease Hpy188I with a new specificity (TCNGA) from H. pylori strain J188, and further cloned the genes of this R-M system. The M gene contains the conserved motifs of aminomethyltransferases, but the R-gene is unique. The system is present in some but not all H. pylori strains, and DNA analysis suggests that it was acquired by horizontal transfer.

EXPERIMENTAL PROCEDURES
Bacterial Strains, Growth Conditions, and Reagents-The bacterial strains used in this study (Table I) are from our laboratory collection and were cultured as described (15). Restriction enzymes and T4 DNA ligase were obtained from New England Biolabs (Beverly, MA). All columns used for protein purification were obtained from Amersham Pharmacia Biotech (Piscataway, NJ), unless otherwise indicated. Oligonucleotides used in this study (Table II) were synthesized either at New England Biolabs or at the Vanderbilt University Cancer Center DNA Core Facility using a Milligen 7500 DNA synthesizer.
DNA Techniques-Chromosomal and plasmid DNA were prepared as described (16). PCR 1 and DNA sequencing were performed as described (15). Computer analyses of DNA and protein sequences were performed with the GCG programs (17,18) and data base similarity searches were performed at the National Center for Biotechnology Information using the BLASTX algorithm (19,20).
Purification of Hpy188I-H. pylori cells were resuspended in ice-cold buffer A (20 mM Tris-HCl, 0.5 mM EDTA, 1 mM dithiothreitol, pH 7.5), then sonicated until ϳ50 mg of protein/g of cells was released. After centrifugation, the supernatant was applied to a 20-ml heparin Hyper-D column (Biosepra, Marlborough, MA). The column was washed with buffer A containing 0.05 M NaCl, and eluted with a 200-ml linear gradient of 0.05-1.0 M NaCl. Fractions were assayed for endonuclease activity by incubation at 37°C for 1 h in New England Biolabs buffer 4 (50 mM KOAc, 20 mM Tris-OAc, 10 mM Mg(OAc) 2 , 1 mM dithiothreitol, pH 7.9), using 1 g of DNA as substrate, and further examined by electrophoresis. The fractions containing Hpy188I activity were pooled, diluted, and applied to a Mono S column, which was eluted with the same NaCl gradient as the first column. After the endonuclease assay, peak fractions of Hpy188I were pooled, diluted, and applied to a Poly-CAT A column (Custom LC Inc., Houston, TX). This column was eluted with a 40-ml linear gradient of 0.05 to 0.6 M NaCl. Finally, the enzymecontaining Poly-CAT A fractions were combined, diluted, and passed through a Mono Q column onto a heparin-TSK column (Tosohaas, Philadelphia, PA). The heparin-TSK column was eluted with a 0.05-0.6 M linear gradient of NaCl in 60 ml. Fractions containing Hpy188I activity were collected. The yield of the purified Hpy188I was ϳ1250 units/g of cells.
Amino Acid Sequencing of Hpy188I-Hpy188I activity among eluted fractions from the heparin-TSK column was titered by serial dilution. One unit of Hpy188I was defined as the activity needed to completely digest 1 g of phage X174 DNA at 37°C for 1 h in New England Biolabs buffer 4. About 300 units of Hpy188I were subjected to SDS-PAGE, and the proteins visualized by silver-staining. Based on the Hpy188I activity in these fractions and protein migration on the gel, a protein band was predicted to correspond to Hpy188I. An SDS-polyacrylamide gel, loaded with ϳ3000 units of Hpy188I, was electroblotted as described (21), and the membrane was stained with Coomassie Blue. The predicted band was excised and subjected to sequential degradation using an automated peptide sequencer (ABI model 470A, Perkin-Elmer Co., Foster City, CA).
Determination of Hpy188I Specificity-pBR322, pUC19, and X174 DNAs were digested into well defined fragments using Hpy188I. Double digestion reactions also were performed in the presence of Hpy188I and a second endonuclease having a single recognition site in the substrate DNA (PstI and AlwNI for pUC19; ClaI, NdeI, and PstI for pBR322; and PstI, NciI, and StuI for X174), which permitted mapping of the location of several Hpy188I cleavage sites in these DNAs. The sizes of the DNA fragments produced by Hpy188I digestion of the DNAs also were entered into the program SITES (22), which predicts recognition sequences. The locations of these potential recognition sequences were compared with the sites mapped by double endonuclease digestions. Then, the fragments predicted by cleavage at the putative recognition sites were compared with the observed restriction fragments from Hpy188I cleavage of the DNAs.
A method involving cleavage of a primed synthesis reaction (23) was used to determine the site of Hpy188I cleavage within the recognition sequence. Two oligonucleotides, M13q1 and M13q2 (Table II), located ϳ50 bp upstream and downstream of a recognition site at position 1353 in M13mp18, were used to perform two sets of sequencing reactions, using M13mp18 as template. Two extension reactions using the same combination of primer and template were carried out simultaneously in the absence of dideoxyribonucleotide terminators. The extended DNAs each were then digested with Hpy188I. The Hpy188I-digested DNAs were then subjected to electrophoresis through an 8% polyacrylamide gel in parallel with the two sets of sequencing reactions, and the gel was analyzed following autoradiography.
Cloning of Hpy188I Restriction-Modification Genes-The methyltransferase selection method (24) was used to clone the Hpy188I R-M system from H. pylori strain J188. To construct the genomic library of strain J188, 10 g of chromosomal DNA was partially digested with Sau3AI, mixed with 1 g of BamHI-digested pUC19, and ligated with T4 DNA ligase. The ligated DNA was transformed into E. coli ER2688, and transformants selected on ampicillin LB plates. Plasmid DNA was prepared from the ampicillin-resistant colonies, and digested with purified Hpy188I to destroy the plasmids not expressing the Hpy188I methyltransferase. The digested DNA was then re-transformed into ER2688, and selected on ampicillin plates. Plasmid DNA from ampillicin-resistant clones, confirmed to be Hpy188I-resistant, was sequenced to obtain the sequence of the M gene. These clones were also assayed for the presence of restriction endonuclease activity. Positive clones were sequenced and the gene for the Hpy188I endonuclease was identified by comparison with the N-terminal amino acid sequence obtained from purified Hpy188I.
Preparation of Antibodies-Hapten-protein conjugates were prepared by periodate oxidation of the methylated nucleosides as described (25). Rabbits were immunized by injecting 500 g of the protein conjugate, in complete Freund's adjuvant, intradermally and subcutaneously for the primary injections, and 250 g in incomplete adjuvant via the subcutaneous route for each boost. The first test bleeds were taken 1 month after the initial injection and then at 3-week intervals.
Detection of Methylated DNA-The presence of N-6 adenine or N-4 cytosine methylated DNA was detected using Western blotting analysis. M.Hpy188I-methylated and control DNAs were denatured, serially diluted, spotted onto nitrocellulose membrane, and UV cross-linked prior to immunoblot detection. Western blotting was performed as described (26). In general, the antisera against N-6 methyladenine or N-4 methylcytosine were diluted 1:50,000-1:500,000 and developed using an horseradish peroxidase-labeled secondary antibody.

RESULTS
Purification and Amino Acid Sequencing of Hpy188I from H. pylori Strain J188 --A crude extract of H. pylori strain J188 cells (8 g) was applied to a heparin Hyper-D column, and eluted with a linear NaCl gradient. A type II activity, designated Hpy188I, was detected in eluted fractions between 0.3 and 0.38 M NaCl, which were pooled and applied to a Mono S column. Hpy188I eluted between 0.26 and 0.3 M NaCl from this column, and Hpy188I positive fractions were then applied to a Poly-Cat A column. After elution, Hpy188I activity appeared in a broad peak between 0.3 and 0.38 M NaCl, with a trace of a second endonuclease activity. To purify Hpy188I further, positive frac-

TABLE II Primers used in this study
The Hpy188I R-M System in H. pylori tions were passed through a Mono Q column onto a heparin-TSK column, and the Hpy188I activity eluted between 0.38 and 0.42 M NaCl from the heparin-TSK column (Fig. 1, A and B). Hpy188I activity among these final fractions was titered on X174 DNA. In total, more than 10,000 units of Hpy188I activity were present in fractions 39 -46. Fraction 42 had the highest endonuclease activity (8 units/l), followed by fraction 43 (4 units/l) (Fig. 1B). SDS-PAGE of the relevant fractions ( Fig. 1C) revealed that a protein band of ϳ21 kDa was present only in lanes of fractions 42 and 43, which had the highest enzyme activity, but not other lanes with less activity. The density of this band in the lane 42 is higher than that in the lane 43, which is consistent with the presence of higher enzyme activity in fraction 42. The size of this protein is in the range typical for type II endonucleases (3). Thus, we predicted that it was Hpy188I. N-terminal sequencing on this protein resulted in a sequence of 27 amino acids: XKRKXDIILKSVDDLKDX-IDXKDFXYK (X, not identified).
Determination of the Recognition Sequence and Cleavage Site of Hpy188I-To determine the recognition sequence, Hpy188I was used to digest pUC19, pBR322, or X174 DNAs (data not shown). The patterns of the well defined fragments from Hpy188I-digested DNA were analyzed, using the SITES program (22), which indicated that these differed from those of all known endonucleases. The sizes of digested fragments from each substrate DNA are consistent with cleavage at TCNGA symmetric sites. Mapping by digestion with additional endonucleases also predicted Hpy188I digestion to occur at TCNGA sites. Thus, we concluded that Hpy188I is a novel endonuclease with the specificity TCNGA.
To determine the cleavage site of Hpy188I within its recognition sequence, the extension products, using M13q1 and M13q2 as primers, and M13mp18 as template, were digested by Hpy188I. The digestion of the M13q1-extension product produced a band that migrated identically with the dideoxy termination product of the unspecified nucleotide in the recognition sequence, TCNGA (N is G in this location) (Fig. 2), indicating cleavage between the N and the G of the recognition sequence. Digestion of the M13q2-extension product with Hpy188I produced a band that also co-migrated with the unspecified nucleotide of the Hpy188I recognition sequence TC-NGA (in this case, the N is the C on the opposite strand of DNA from the G in the M13q1 reaction) (Fig. 2). This result confirms cleavage between the N and G of the recognition sequence on this strand of DNA as well. Thus, Hpy188I cuts DNA symmetrically between N and G in its recognition site (TCN2GA) on both DNA strands to produce a one-base 3Ј-extension.
Cloning and Analysis of the Hpy188I Restriction-Modification Genes-We next sought to clone the Hpy188I R-M genes. Digestion of plasmid DNA (2 g) from a Sau3AI genomic library of strain J188, and retransformation of the digested DNA back into E. coli resulted in 44 ampicillin-resistant transformants. Plasmids from 3 of the 44 clones (numbers 16, 42, and 60) were confirmed to be Hpy188I-resistant, while they remained digestible by Sau3AI and HindIII (Fig. 3). Migration of the plasmid DNAs after incubation with Hpy188I was slower than the uncut plasmid, a shift that may be due to Hpy188I binding to DNA. The genomic DNA inserts in these plasmids were 3-4 kb long. The Sau3AI and HindIII digestion patterns of the plasmids were similar, indicating that all inserts cloned in these plasmids were from the same genomic locus, although their sizes were slightly different.
DNA sequence analysis showed that plasmid p#16 carried an ϳ3.6-kb genomic DNA insert. This insert possessed two complete ORFs of 1299 and 513 bp, oriented in the same direction and overlapping by 41 nucleotides, and two incomplete ORFs of 555 and 566 bp, one at the 5Ј and the other at the 3Ј end of the insert (Fig. 4). The 1299-bp ORF had regions similar to the nine conserved motifs found in aminomethyltransferases (27), indicating that it is the gene for M.Hpy188I, hpy188IM. The 513-bp ORF showed no similarity to any known gene in GenBank on either the DNA or the amino acid level. The two partial ORFs showed strong matches to genes identified in H. pylori strain 26695. The 555-bp partial ORF at the 5Ј end matched HP#1117 (omp27) that encodes an outer membrane protein, and the 566-bp partial ORF at the 3Ј end was similar to HP#1118 (deoD) that encodes a purine-nucleotide phosphorylase.
hpy188IM would encode a predicted 432-amino acid product with a molecular mass of 50.9 kDa, which is in the typical size range of DNA methyltransferases. Nine motifs identified in its product, including the (N/S/D)PP(Y/F) motif, are arranged in the order of motif X, and I to VIII. The longest variable region is near the C-terminal, where the target recognition domain (TRD) presumably is located (27). The 1299-bp ORF of hpy188IM uses GTG as a translation start site. There is a potential translation start site, ATG, 155 bp downstream of the GTG. Expression of hpy188IM in E. coli starting from the GTG site generated a functional methyltransferase, while expression from the downstream ATG site did not (data not shown), indicating that the GTG site is the start codon for hpy188IM translation.
By endonuclease assay, we found that all Hpy188I-resistant clones demonstrated weak endonuclease activity (data not shown), suggesting that hpy188IR was also present in the inserts of these Hpy188I-resistant plasmids. The 513-bp ORF encodes a 170-amino acid product with a molecular mass of 20.3 kDa, which matches the size of the purified Hpy188I (Fig.  1C). Its predicted N-terminal sequence also matched the 27amino acid sequence obtained from the purified Hpy188I protein. Furthermore, expression of this ORF in E. coli generated a functional Hpy188I (data not shown), indicating that it is the gene encoding Hpy188I.
Detection of the Bases in the Recognition Sequence Methylated by M.Hpy188I-DNA methyltransferases have been divided into three distinct groups, ␣, ␤, and ␥ (27), based on the order of motifs and sequences in these motifs. The order of the M.Hpy188I motifs (X and I to VIII) and the sequences present in its motifs are the same as or similar to those from N-6 adenine methyltransferases in the ␥ group, indicating that M.Hpy188I is a member of the ␥ group. Thus we hypothesized that M.Hpy188I is most likely an N-6 adenine methyltransferase. Although no N-4 cytosine methyltransferases have yet been found to belong to the ␥ group, we could not rule out the possibility just on the basis of sequence analysis.
To determine whether our hypothesis is correct, Western blotting was performed against M.Hpy188I-methylated DNA. M.Hpy188I-methylated DNA, p#16, was prepared in E. coli DB23 which has no endogenous N-6 or N-4 methyltransferases. 2 p#16 from DB23 was resistant to digestion by Hpy188I, while pUC19 from DB23 was susceptible (data not shown), as expected, indicating modification of p#16 but not pUC19 DNA. pUC19 grown in a damϩ E. coli strain was used as a positive control for N-6 adenine-methylated DNA, while pBamM (28, 29) from DB23 was used as a positive control for N-4 cytosine-methylated DNA. When antibodies against N-6 2 D. Byrd, S. Stickel, and R. J. Roberts, personal communication. adenine-methylated DNA were used as the probe (Fig. 5A), p#16 DNA gave a strong signal, like the positive control pUC19(N6-A) DNA. pBamM(N4-C) and the negative control DNA, pUC19(Ϫ), gave no signal, as expected. This result indicates that p#16 was methylated at the N-6 position of adenine in the Hpy188I recognition sequence TCNGA. In contrast, when antibodies against N-4 cytosine-methylated DNA were used (Fig. 5B), only the positive control DNA pBamM(N4-C) gave a hybridization signal. Thus, M.Hpy188I is an N-6 adenine methyltransferase.
Search for Hpy188I R-M Genes in the Complete Genomic Sequences of H. pylori Strains 26695 and J99 -H. pylori strain 26695 was predicted to have 14 potential type II R-M systems (13). However, none of these systems showed similarity to the Hpy188I R-M system, indicating its absence from strain 26695. The sequences flanking the Hpy188I R-M genes, in which part of omp27 and deoD of strain J188 are located, match a region containing HP1177 (omp27) and HP1178 (deoD) in the genome of strain 26695 (Fig. 6) with Ͼ95% identity. The conservation of omp27 between the two strains continues for 96 bp upstream of omp27 ORF, while the conservation of deoD stops near the end of its ORF (Fig. 6). In 26695, the two conserved genes are separated by a 365-bp segment, which contains no obvious ORF. In contrast, in J188, the corresponding region is 2457 bp and includes the hpy188IM-hpy188IR genes (Fig. 6). Overall, the 365-and 2457-bp regions share little similarity. The complete genomic sequence from a second H. pylori strain, J99, was recently published (14). DNA analysis indicated that J99 has no Hpy188I R-M system either. Two genes, omp27 and deoD, are also highly conserved in J99, and their conservation ends at the same locations as those in the other two strains (Fig. 6). Furthermore, a 372-bp region separates the two conserved genes in J99 and shares Ͼ90% identity with the 365-bp region of 26695, but no similarity to the 2457 bp of J188.
Analysis of the hpy188IM-hpy188IR locus of strain J188 reveals 92-bp direct repeats with only 3 mismatches, which flank the R-M genes (Fig. 6). The 92-bp repeat on the right is located at the junction of the 2457-bp region and the conserved deoD (Fig. 6), and 79 bp of this repeat corresponds to the 3Ј end of the deoD ORF, a region conserved among all three strains. The 92-bp repeat on the left is located 171 bp downstream of the conserved omp27 (Fig. 6). The sequence of the 171-bp region is completely different from those in the corresponding regions of 26695 or J99, and has no homologs elsewhere in either of the sequenced strains, suggesting that it has a different origin. The 92-bp direct repeats are not present in the corresponding intergenic region of 26695 and J99, suggesting these repeats are related to the acquisition event of the Hpy188I R-M system in strain J188.
There is a 49-bp segment located 4 bp downstream of the right 92-bp repeat and 217 bp upstream of the hpy188IM ORF in strain J188 (Fig. 6). This segment shows strong similarity to segments of the same length in strains 26695 (with 7 mismatches) or J99 (with 11 mismatches) that lie 2 or 6 bp downstream of the deoD ORF (Fig. 6). This segment is not found elsewhere in the 2 sequenced H. pylori genomes. Thus, it is unlikely that the J188 version represents a chance similarity. This arrangement in which the Hpy188I R-M genes adjoin the 49-bp segment and deoD suggests that a module containing the R-M system may have integrated specifically into the region between the 49-bp segment and the 3Ј end of deoD. The event may have resulted in the 92 bp duplication that is now seen. Considering the possibility of Hpy188I involvement in DNA mobility, we checked for the presence of its recognition site, TCNGA, in the related regions. However, the locations found (Fig. 6) do not suggest that they were directly involved in the integration event.
To further investigate the origin of this R-M system, the G ϩ C content of these regions were calculated. The GϩC content of the 2457-bp J188-specific region in strain J188 was 28.8%, while that of the Hpy188I R-M ORFs was 29.9%. The G ϩ C content in the 365-bp intergenic region of strain 26695 is 32.3%, and a similar G ϩ C content is observed in the J99 equivalent region. In contrast, the G ϩ C content in the flanking regions including omp27 and deoD, was 39.5%, which matches the overall G ϩ C content (39%) of H. pylori (13,14,37). The significantly lower G ϩ C content of the 2457-bp J188specific region strongly suggests that the hpy188IM-hpy188IR locus was introduced during the evolution of the H. pylori genome.
Study of Hpy188I Diversity Among Various H. pylori Strains-To further study the diversity of the Hpy188I R-M system, chromosomal DNAs from 10 H. pylori strains, includ-ing J188 and 26695, were examined for their modification at TCNGA sites. As expected, the DNA of J188 was resistant to Hpy188I digestion, indicating modification at TCNGA sites, whereas DNA from 26695 was digestible, corresponding to the absence of the R-M system in its genome (Fig. 7A). DNA from seven other strains was digestible, but strain J166 was resistant, suggesting that the Hpy188I R-M system is present in J166, but not the other strains. To confirm this observation, a pair of primers, QII ORF-F and QII ORF-R corresponding to the 5Ј end of hpy188IM and the 3Ј end of hpy188IR, respectively, were used to amplify the same set of DNAs (Fig. 7B). Only DNA from J188 and J166 gave PCR products with the predicted size of 1.8 kb, whereas no PCR products were observed for the other strains. Thus, only J188 and J166 among the strains tested have the Hpy188I R-M system.
To investigate the corresponding regions of the Hpy188IM-hpy188IR locus among these strains, primers QII-F and QII-R, corresponding to the 5Ј and 3Ј ends of the conserved omp27 and deoD, were used for PCR (Fig. 7C). As expected, DNA from J188 yielded a PCR product of the predicted size of ϳ3.0 kb. J166 DNA yielded a product of the same size, indicating that both the size and location of the Hpy188I-integrated region in J166 resembles that for J188. DNA from 26695 yielded a PCR product that matched the expected size of 1.0 kb. The other strains yielded PCR products of the same size as for 26695, except for 60190 and J178 which yielded 1.4-or 1.1-kb products, respectively. To further assess this heterogeneity, PCR fragments from strains A101, J262, 60190, and J178 were sequenced. The sequences of A101 and J262 shared Ͼ80% identity with that of 26695; and those of 60190 and J178 shared Ͼ60% identity. The presence of direct repeats (with sizes varying between 60 and 150 bp) in the J178 and 60190 sequences made their PCR products larger than those of the rest of the strains. These data indicate that only one genotype is present in the region between omp27 and deoD among the strains not possessing the Hpy188I R-M system. All ORFs present in the region are Ͻ125 bp, suggesting no functional genes. The 49-bp segment identified in strains J188, 26695, and J99 is also present at a similar location in the intergenic region of these 4 strains. DISCUSSION We have cloned and sequenced genes encoding the Hpy188I R-M system, a novel type II R-M system from H. pylori strain J188. Only two of 11 H. pylori strains examined possess this R-M system, and its significantly lower G ϩ C content strongly suggests that this R-M system is a relatively recent acquisition by H. pylori. Comparison of the J188 hpy188IM-Hpy188IR gene locus and the genomic sequence of strain 26695 indicates that the Hpy188I R-M system was introduced into a 365-bp intergenic region with a G ϩ C content (32%) lower than average (39%) for H. pylori. Five regions with a significantly different G ϩ C composition have been found in the genome of strain 26695 (13), but this 365-bp region is not located in any of these previously identified regions. The 365-bp region is also present at the same location in J99 and other strains that do not carry the Hpy188I R-M system, indicating conservation of this low G ϩ C region. The Hpy188I R-M genes are flanked by 92-bp direct repeats, a situation that resembles the 37-40-kb cag pathogenicity island (cagPAI) in H. pylori (31), which also has a lower G ϩ C content and is flanked by 31-bp direct repeats. The presence of these direct repeats further suggests that the hpy188IM-hpy188IR locus could have integrated into the H. pylori genome during a transposition event. A 49-bp segment, downstream of deoD, in the region of 26695 and others, is also present in the J188 region. The R-M system could have specifically integrated into the site between this small segment and deoD. However, it is unclear how this 49-bp segment was preserved while the rest of the original region was replaced by a completely different sequence.
In studies of the R-M systems of EcoO109I, AccI, BglII, Eco47I, and others (32)(33)(34)(35)(36), components of either prophages or transposons were found closely associated with their R-M genes. In the case of Hpy188I, however, no mobility genes can be identified immediately adjacent to its genes, which is also true for most of the type II R-M systems predicted in the two sequenced strains (13,14). These data suggest that H. pylori uses a different mechanism for the horizontal transfer of R-M systems. One possibility is that the Hpy188I R-M system integrated into the H. pylori genome by homologous recombination in a region of lower G ϩ C content. Another mechanism for horizontal transfer is exemplified by intron homing and transposition (37,38), where endonucleases play the key role in introducing double strand breaks. While restriction enzymes have not yet been directly implicated in such events, it remains possible that they could initiate DNA insertion events. Analysis of the region between omp27 and deoD among 6 strains not possessing the Hpy188I R-M system reveals that 3 have a TCNGA site. It is conceivable that the restriction activity of Hpy188I could have facilitated the integration process of its R-M genes into the H. pylori genome. If this were the case, a TCNGA site could have been located in the region between the 49-bp segment and the deoD ORF, and might have provided the initial break point. Hpy188I could have cleaved this site, and subsequently, its R-M genes could have been integrated into this cleaved site. The origin of the 92-bp direct repeats of the target sequence near the 5Ј end of deoD is unknown, but again is reminiscent of transposition events.
Strains J188 and J166 have the Hpy188I R-M system integrated in the same region, suggesting that they might have arisen from the same original strain. However, their genotypes at three other loci, vacA (39), cagA (40 -42), and iceA (43) ( Table I) are substantially different from each other, indicating that they are not closely related. Thus, the differences at these loci may be explained by two independent Hpy188I R-M system integration events into this lower G ϩ C region in two separate strains. If this is true, the lower G ϩ C region may be a particularly hot site for integration. Alternatively, J188 and J166 may have arisen from the same original strain that acquired the R-M system, and has subsequently diverged at the vacA, cagA, and iceA loci.
The Hpy188I R-M genes cloned in this study were present in only two of 11 H. pylori strains tested. The diversity of this R-M system among various strains is consistent with studies on other type II R-M systems in H. pylori. These include iceA1-hpyIM, an NlaIII-like R-M system (15,43) where the R gene is allelic with a non-R gene, and a DdeI isoschizomer (44) which resulted from an integration event. In addition, comparison of the genomic sequences of strains J99 and 26695 indicates that some major strain-specific components are R-M genes (14). Finally, a study examining genomic differences between H. pylori strains J166 and 26695, using a PCR-based subtractive hybridization method, shows that seven of 18 DNA clones specific to J166 appear to be R-M genes (45). Although we found the Hpy188I R-M system to be present in J166, but not in 26695, this difference was not found during the previous study (45).
This study exemplifies the apparent propensity of H. pylori to accumulate R-M systems, presumably by integrating them into inactive positions of the genome. It is unknown, although, why this organism, for which there are no known bacteriophages, needs so many R-M systems. A feature of H. pylori infection is its persistent colonization in the human stomach mucosa for years or decades (10 -11). Clearly, H. pylori is well adapted to this gastric environment and it is tempting to think that the acquisition of so many R-M systems might be related to this unique lifestyle. Restriction enzymes and their associated methyltransferases in H. pylori might provide a biological role that we have yet to discover.