The Non-LTR (Long Terminal Repeat) Retrotransposon L1Tc fromTrypanosoma cruzi Codes for a Protein with RNase H Activity*

The deduced amino acid sequence of the region downstream of the reverse transcriptase (RT) motif of theTrypanosoma cruzi L1Tc non-LTR retrotransposon shows a significant homology with the sequence coding for proteins with RNase H activity from different organisms and retroelements. The 25-kDa His6-tagged recombinant protein bearing only the L1Tc RNase H domain, named RHL1Tc, exhibits RNase H activity as measured on the [3H]poly(rA)/poly(dT) hybrid used as substrate as well as on specific homologous and heterologous [32P]RNA/DNA hybrids. The mutation of the conserved aspartic acid at position 39 of the enzyme catalytic site, but not of the serine at position 56 (non-conservative amino acid), abolishes protein RNase H activity. The RNase H activity of the RHL1Tc protein is Mg2+-dependent, and it is also active in the presence of the Mn2+ ion. The optimal condition of RNase H activity is found at pH 8 and 37 °C, although it also has significant enzymatic activity at 19 °C and pH 6. However, it cannot be excluded that the RNase H activity level and its optimal conditions may be different from that of a protein containing both RT and RNase H domains.

Sequencing of the human genome and those of several other organisms has revealed the existence of a high number of retrotransposable elements such as LINEs 1 (long interspersed nucleotide elements) and SINEs (short interspersed nucleotide elements) (1). A common characteristic of these elements is that they are mobilized into the genome via an intermediate RNA. LINEs are considered autonomous elements because they encode the proteins involved in their own transposition process. In contrast, because SINEs, such as Alu sequences, lack a protein-coding capacity it has been suggested that they use the enzymatic machinery of LINEs in trans (1). In the human genome there exist from 3000 to 5000 full-length copies of LINE-1 dispersed throughout the genome, accounting for 15-17% of its mass (2). Recent studies have involved these elements in relevant biological processes such as gene regulation (3,4), modeling of the genome shape (5), or DNA-shuffling phenomena (6). Moreover, it has been shown that insertion of LINE in certain genes affects the outcome of some diseases and has been associated with the generation and progression of certain cancers (7,8).
LINEs use a transposition mechanism in which the reverse transcription of the RNA template is primed by the release of a 3Ј-hydroxyl group (9) after cleaving chromosomal DNA with an endonuclease activity encoded by the element (10,11). To synthesize the second strand DNA, the RNA template has to be removed from the RNA/cDNA hybrid by an RNase H activity, which may be supplied by the host cell (12). Recent studies (13) have described in some non-LTR retrotransposons the existence of sequences with a certain homology to proteins with RNase H activity. However, RNase H activity has still not been demonstrated to be present in the non-LTR retrotransposons.
L1Tc is a highly repetitive non-LTR retrotransposon widely distributed in the genome of the parasite Trypanosoma cruzi (14,15), which is actively transcribed into poly(A ϩ ) RNA. The first L1Tc-characterized copy corresponds to a cDNA containing three ORF (14). The second and third ORF showed significant homology with the RT and cysteine motifs of the pol and gag genes, respectively, from retrovirus and LTR retrotransposons. The first ORF exhibited an endonuclease domain, and the recombinant protein encoded by this ORF showed apurinic/ apyrimidine (AP) endonuclease activity (16) and 3Ј-phosphatase and 3Ј-phosphodiesterase activities (17). Studies of the genomic organization and distribution of L1Tc in T. cruzi show the existence of some L1Tc copies formed by a single ORF of 5 kb that contains the endonuclease, RT, and cysteine motifs (15 TACTAGTCGACTTGTTGGTGAGCTGGG-3Ј) were used to amplify by PCR the homologous L1Tc region, which in the L1Tc cDNA copy (14) is between positions 3571 and 4182, using the pBAC14 vector (15) as template. SacI and SalI sites (underlined) were generated ad hoc in the amplified fragment, which was subsequently digested with those enzymes and cloned into the pQE30 expression vector (Qiagen). The resulting plasmid, containing 603 bp of the amplified region, was called pQRHL1Tc (GenBank TM AY045718). Two copies of the amplified fragment were in-frame cloned into the pQE30 expression vector, generating the vector pQRH2L1Tc.
Site-directed Mutagenesis of the L1Tc RNase H Domain-Mutagenesis of the L1Tc RNase H domain was performed using a PCR-based technique as described (18), with pQRHL1Tc as template. Briefly, separate PCRs were performed using a set of site-directed mutagenic primers and pQE30 vector-specific primers. For the aspartic acid mutation at position 39 (GAT 3 AAT) the oligonucleotides used for the PCR were D/N-sense, 5Ј-GCGACGAATGGCGGTGTAGACG-3Ј and D/N-antisense, 5Ј-CTACACCGCCATTCGTCGC-3Ј. Primers for the mutation of the serine at position 56 (TCG 3 GCG) were respectively S/A-sense, 5Ј-CCCTCAATTCAGCGGAGATAATAGA-3Ј and S/A-antisense, 5Ј-TTATCTCCGCTGAATTGAGGG-3Ј. Mutated nucleotides are underlined. Bold faced letters indicate the modified nucleotides. The vector-specific primers were pQE-forward, 5Ј-GGCGTATCACGAGGC-CCTTTCG-3Ј and pQE-reverse, 5Ј-CATTACTGGATCTATCAACAG-G-3Ј. The 0.6-kb amplified fragments containing the desired mutations were digested with SacI and SalI and subcloned into the pQE vector digested with the same restriction enzymes, generating clones pQRHD39NL1Tc and pQRHS56AL1Tc, respectively. The introduced mutations were confirmed by DNA sequencing.
Expression and Purification of Recombinant Proteins-The E. coli M15 strain was transformed with the constructs pQRHL1Tc, pQRHD39NL1Tc, and pQRHS56AL1Tc and the E. coli TOP3 strain with pQRH2L1Tc vector. The recombinant proteins were overexpressed in cultures grown at OD 600 ϭ 0.6 with 1 mM IPTG for 3 h at 37°C and were purified to homogeneity by Ni 2ϩ affinity chromatography. For this purpose, cell pellets were resuspended in buffer containing 50 mM sodium phosphate and 300 mM sodium chloride, pH 7.4, incubated with 1 mg/ml lysozyme for 15 min, sonicated for 5 min, and centrifuged for 10 min at 12,000 ϫ g at 4°C. Soluble fractions were incubated with Ni 2ϩ NTA-agarose resin (Qiagen) for 1 h at room temperature. Subsequently, the resins were washed three times with 20 ml of buffer containing 50 mM sodium phosphate, 300 mM sodium chloride, 20% glycerol, and 5 mM imidazole, pH 7.4, and the recombinant proteins were eluted with 250 mM imidazole. Fractions containing the recombinant proteins were diluted to 25 mM imidazole and repurified to homogeneity. Protein concentration was determined by the Bradford method (19).
The ␤-galactosidase ␣-peptide (␤-gal) [ 32 P]RNA/DNA hybrid was used as heterologous substrate. The DNA template for RNA synthesis was obtained by PCR using CI1 (5Ј-TAATACGACTCACTATAGGG-TATGCTTCCGGCTCGTAT-3Ј) and P4 (5Ј-TATGCATCTATGCGGCAT-CAGAGCA-3Ј) oligonucleotides. Primer CI1 contains in 5Ј the T7 RNA polymerase promoter sequence (underlined). Plasmid pASW was used as template. pASW is a derivative of the pUC19 plasmid, which contains the 14-nucleotide-long wild type target of the (Ϫ)sTRSV (negative polarity satellite RNA of the tobacco ring spot virus) hairpin rybozime cloned into the coding region of the enzyme ␤-galactosidase (provided by Drs. A. Berzal-Herranz and A. Barroso-delJesus). 2 g of the amplified DNA were used for in vitro transcription using T7 RNA polymerase (see above). The 414-nt-long RNA was radiolabeled and purified, and the specific activity was measured as described for L1Tc-(1203-1352) [ 32 P]RNA. The p1 oligonucleotide 5Ј-TGAATTCAAACAGGACTGTCA-GAGCTCGGTACCCGGGGA-3Ј, complementary to nucleotides 119 -158 in the ␤-galactosidase [ 32 P]RNA, was hybridized to radiolabeled RNA as described above except that 0.25 pmol of ␤-galactosidase [ 32 P]RNA and 5 pmol of p1 oligonucleotide were used.
RNase H Activity Assays-The nonspecific RNase H assay, using the [ 3 H]poly(rA)/poly(dT) hybrid as substrate, was performed essentially as described (20) except that 1 Ci/ml hybrid in a final volume of 25 l was added to the reaction. The master buffer reaction (25 mM Tris-HCl, pH 8, 5 mM magnesium chloride, 90 mM sodium chloride, 1.5% glycerol, 50 g/ml bovine serum albumin, 0.01% Nonidet P-40, 1 mM dithiothreitol, and 20 units of RNasin (Promega)) was modified in order to analyze the pH effect and RHL1Tc protein metal dependence. 25 ng of each enzyme were used, except for in the E. coli RNase H (Roche) assay where 2.5 ng were used. Incubation time was 60 min at 37°C except when otherwise indicated.
The RNase H activity assay on specific homologous and heterologous radiolabeled substrates was carried out mixing, respectively, 5 l of each mentioned [ 32 P]RNA/DNA hybrid with 7.5 l of RNase H buffer (83.3 mM Tris-HCl, pH 8, 0.033% Nonidet P-40, 116.6 mM sodium chloride, 3.3 mM dithiothreitol, 8.3 mM magnesium chloride) and 25 ng of each enzyme. Reactions (12.5 l) were incubated at 37°C for 30 min and stopped with the addition of 2ϫ stop solution (92% formamide, 17 mM EDTA, 0.025% xylene cyanol, 0.025% bromphenol blue) (v/v). Finally, samples were denatured at 95°C for 2 min, cooled on ice for 5 min, and electrophoresed through a denaturing 6% polyacrylamide, 8 M urea gel. The gel was wrapped in plastic and exposed to Kodak X-Omat autoradiographic film.

Presence of an RNase H Domain in L1Tc
Retrotransposon of T. cruzi-Sequence analysis of different genomic L1Tc copies from a T. cruzi library constructed in pBAC vector showed that in some clones the L1Tc element is formed by a single open reading frame of 5 kb (15). Remarkably, a sequence with significant amino acid homology to enzymes with RNase H activity from E. coli and HIV-1 was found (Fig. 1B) in these copies downstream of the reverse transcriptase (RT) domain (Fig. 1A). Thus, a 139-amino acid region present in the L1Tc copy, which was contained in the pBAC14 clone (15), showed 26.9 and 31.6% identity with the RNase H coding sequence from HIV-1 and E. coli (Z-score of 8 and 6, respectively). Moreover, the amino acids forming the active site (essential for catalysis of the E. coli RNase H enzyme) as well as the neighboring region of the catalytic site (21,22) are conserved in the RNase H protein encoded by L1Tc (Fig. 1B).
To determine whether the L1Tc region containing the putative RNase H domain codes for a protein with RNase H activity, a 603-bp DNA fragment from pBAC14 containing the L1Tc RNase H domain was cloned into the SacI and SalI sites of the pQE30 vector. In addition, two copies of the amplified fragment were also cloned in-frame into the pQE30 vector. The resulting vectors, pQRHL1Tc and pQRH2L1Tc, respectively, were transformed into E. coli M15 and TOP3 strains, and the recombinant proteins were overexpressed after IPTG induction. Two intensely stained bands of ϳ25 and 40 kDa that corresponded, respectively, to the expected sizes of the RHL1Tc and RH2L1Tc recombinant proteins were observed in bacterial extracts following SDS-PAGE and Coomassie Blue staining (Fig. 2, lanes 1  and 5). To strongly correlate the putative RNase H activity with the protein encoded by the L1Tc RNase H domain, two clones coding for mutated proteins were generated, one at the conserved aspartic acid at position 39 of the enzyme catalytic site (RHD39NL1Tc) and the other at the non-conserved serine at position 56 (RHS56AL1Tc). The mutated recombinant proteins were overexpressed in the M15 E. coli strain by IPTG induction, showing after SDS/PAGE and Coomassie Blue staining the expected size of 25 kDa. The recombinant proteins were solubilized under native conditions and purified to homogeneity by Ni 2ϩ affinity chromatography (Fig. 2, lanes 2-4 and 6).
RNase H Enzymatic Activity-The RNase H enzymatic activity of the purified proteins was first measured on a linear [ 3 H] poly(rA)/poly(dT) hybrid using the optimal conditions described for the HIV-1 RNase H enzyme (HIV-1 RT protein) (20). The measured acid-soluble radioactivity (Fig. 3) resulting from incubation with the RHL1Tc protein showed that the recombinant protein encoded by the L1Tc element has RNase H activity with an RNA/DNA hybrid substrate. The mutation at the nonconserved serine residue at position 56 did not affect the RNase H activity of the protein. However, the mutation at the aspartic acid present in the presumed RNase H catalytic site of the L1Tc RNase H protein abolished enzymatic activity. From the molar concentration of the RNase H proteins used in the assays, it was calculated that the activity of the RNase H enzyme encoded by L1Tc is 40-and 15-fold lower, respectively, than the activity of the RNase H enzymes of E. coli and HIV-1.
The enzymatic activity of the protein encoded by the L1Tc RNase H domain was also analyzed by monitoring its cleavage efficiency on homologous and heterologous uniformly labeled RNA transcripts hybridized to an internal complementary oligonucleotide. Two different labeled RNA/DNA hybrids were used. One corresponded to the coding region of ␤-galactosidase, which does not contain significant sequence homology with L1Tc retrotransposon. The other one corresponded to an internal region of L1Tc. Thus, a 414-nt RNA fragment of ␤-galactosidase was annealed to a complementary 39-nt oligodeoxynucleotide (Fig. 4A 1 ) and incubated with RHL1Tc, RHD39NL1Tc, RHS56AL1Tc, RH2L1Tc, and the E. coli and HIV-1 RNase H proteins. The purified recombinant protein encoded by the L1Tc RT domain (RTL1Tc) and a reaction without enzyme were used as negative controls. Analysis of the RNA cleavage products on a denaturing 6% polyacrylamide, 8 M urea gel showed that the RHL1Tc, RHS56AL1Tc, and RH2L1Tc proteins cut the labeled RNA strand of the RNA/ DNA hybrid releasing, in a similar form to the proteins from E. coli and HIV-1, two major RNA fragments of ϳ120 and 260 nucleotides, which correspond to the oligonucleotide non-annealed ends of the RNA strand (Fig. 4A 2 , lanes 4, 5, and 8). However, as occurs with nonspecific substrates, the RHD39NL1Tc mutant protein did not show enzymatic activity ( Fig. 4A 2 , lane 9). Slight differences were observed in the size of the fragments generated by each enzyme because of heterogeneity in the cleavage sites of RNase H proteins assayed. The RHL1Tc and RH2L1Tc proteins showed the same cleavage pattern, although RH2L1Tc hydrolyzed the substrate more efficiently than RHL1Tc. As was expected, no RNase H activity was detected in the reaction containing the RT protein encoded by the L1Tc RT domain or in the reaction to which no enzyme was added (Fig. 4A 2 , lane 3 and 6).
In order to test RNase H activity with a homologous substrate, an L1Tc RNA/DNA hybrid molecule was formed between a 227-nt-long transcript derived from L1Tc (see "Experimental Procedures") and a 21-nt-long complementary oligonucleotide. The uniformly labeled RNA fragment/DNA hybrid was incubated with different RNase H proteins (Fig. 4B 1 ). Analysis of the RNA cleavage products generated by the RHL1Tc, RHS56AL1Tc, and RH2L1Tc proteins showed two major fragments of ϳ110 and 95 nt corresponding to the nonannealed ends of the RNA strand (Fig. 4B 2 , lanes 4, 5, and 8).
Like the ␤-galactosidase hybrid, the RHD39NL1Tc mutant protein did not show RNase H activity on the L1Tc-homologous substrate (Fig. 4B 2 , lane 9). Moreover, the RH2L1Tc protein also cut the L1Tc-(1203-1352) RNA/DNA hybrid substrate more efficiently than the RHL1Tc protein did. The E. coli RNase H protein efficiently cleft the hybrid, releasing two major fragments (Fig. 4B 2 , lane 1) similar to those released by the RNase H encoded by L1Tc. However, the cleavage pattern of the HIV-1 RT protein from HIV-1 was different (Fig. 4B 2 ,  lane 2), probably because of particular structure or sequence determinants present in the L1Tc-(1203-1352) RNA/DNA hybrid.
Enzymatic Requirements of RHL1Tc-To determine the ion requirements of the RHL1Tc enzyme, RNase H activity was analyzed at different Mg 2ϩ and Mn 2ϩ ion concentrations at pH 8. The RHL1Tc protein was determined to require Mg 2ϩ or Mn 2ϩ with a [ 3 H]poly(rA)/poly(dT) hybrid substrate with an optimal concentration of 5 mM Mg 2ϩ or 1 mM Mn 2ϩ (Fig. 5, A  and B). The RNase H activity level in the presence of 1 mM Mn 2ϩ ion was ϳ40% lower than that detected in the presence of 5 mM Mg 2ϩ (Fig. 5C). To determine the RHL1Tc optimal temperature and pH conditions, RNase H activity was measured under different temperature and pH values in the presence of 5 mM Mg 2ϩ . As occurs with the E. coli and HIV-1 enzymes, the  maximal RNase H activity detected in the RHL1Tc protein was observed at 37°C and pH 8. However, and remarkably, the RNase H enzyme encoded by the L1Tc element was active in a higher range of pHs and temperatures than those observed for the HIV-1 and E. coli enzymes. Thus, the RHL1Tc protein showed significant RNase H activity at 19°C and pH 6 (Fig. 6,  A and B). Under these conditions the HIV-1 and E. coli RNase H enzymes were inactive. DISCUSSION RNase H enzymes are widely distributed among prokaryotes and eukaryotes (24), and it is believed that they participate in DNA replication processes by removing the upstream RNA primers of Okazaki fragments (25). During the retrotransposition event of retrovirus and retrovirus-like elements an RNase H activity encoded by its own elements removes the RNA template of the generated RNA/cDNA hybrid in order to allow second strand DNA synthesis. It has been suggested that in the case of LINE the RNase H activity needed for completing the retrotransposition process could be provided by the host cell (12). A putative RNase H domain has recently been reported (13) in some non-LTR retrotransposons, in an analogous position to that described for retroviruses and LTR-retrotransposons. However, it has not yet been demonstrated that the putative RNase H domain is endowed with enzymatic activity. Sequence analysis has evidenced that a 603-bp-long region from a L1Tc genomic copy has a significant degree of sequence similarity with different RNase H proteins and is conserved in similar positions to the amino acids forming the active site described for the RNase H enzyme family (22). This sequence has been cloned into an expression vector, overexpressed, and purified to homogeneity. In this study, we report that the purified RHL1Tc recombinant protein containing the L1Tc RNase H domain has RNase H activity on various RNA/DNA hybrid substrates. Remarkably, the RHD39NL1Tc mutated protein, which presents a substitution at the conserved aspartic acid residue at position 39, is devoid of cleavage activity. This result indicates this residue is also essential for enzymatic activity of the L1Tc-encoded enzyme, as has been described for the E. coli, human, and retrovirus RNase H proteins (26). On the other hand, as expected, the RHS56AL1Tc mutated protein, bearing a substitution at the non-conserved serine amino acid at position 56, is as active as the RHL1Tc wild type recombinant protein.
Enzymatic activity detected for the RHL1Tc protein was similar to that described for some retroelements, such as the endogenous HERV-K retrovirus (27), and significantly lower than that observed for the E. coli RNase H and HIV-1 RT enzymes. It is worth mentioning that other isolated RNase H domains from retroviruses have been shown to present a lower enzymatic activity than the protein containing both RT and RNase H domains (28,29). On the other hand, in HIV-1 (30) but not in MuLV (31), the RNase H domain is not an active protein by itself. It has been suggested that the presence of a high content of basic residues in some regions of the protein could be necessary for activity. The MuLV protein contains 23% basic amino acids in the ␣-C, -B, and -D regions, the handle region, whereas the HIV-1 protein only has 3.6% (23,29,31). It has been suggested that these basic amino acids function in substrate affinity and RNase H stability (32). The isolated HIV-1 RNase H domain increases activity when a polyhistidine tag is added (28,29). Although the basic amino acid content in the putative handle region of the isolated L1Tc RNase H domain is double that present in the HIV-1 RNase H protein, we cannot exclude the contribution by the hexahistidine tag to the detected enzymatic activity, which the RHL1Tc recombinant protein carries at its N-terminal end.
The RNase H protein encoded by the L1Tc element is Mg 2ϩdependent; it is also active in the presence of the Mn 2ϩ ion. The optimal pH and temperature conditions are pH 8 and 37°C. These conditions are similar to those described for the E. coli RNase H and HIV-1 RT enzymes. However, it cannot be excluded that enzyme optimal conditions could be different for a protein containing both RT and RNase H domains, as has been described in retrovirus (29,33). Remarkably, the RHL1Tc protein is more temperature and pH permissive than the HIV-1 RT and E. coli proteins, presenting significant enzymatic activity at 19°C and pH 6. Similar temperature permissiveness has been reported for the endogenous RNase HI enzyme of the related parasite Trypanosome brucei (34). The wide temperature permissiveness of the enzyme encoded by the L1Tc element may be related to the life cycle of the parasite because it involves three different and diverse environmental conditions within the hosts.
The cleavage pattern generated by the RHL1Tc protein after incubation with the 414-nt RNA fragment of the ␤-galactosidase/DNA hybrid as substrate was very similar to that released by RNase H enzymes from E. coli and HIV-1. However, when the 227-nt RNA fragment of the L1Tc-(1203-1352)/DNA hybrid was used as substrate, the cleavage pattern released by RNase H enzymes from L1Tc and E. coli differed from that released by RNase H enzyme from HIV-1. This divergent pattern obtained with the HIV-1 RT protein may be due to the presence, in the hybridized L1Tc RNA region, of a 5-nt-long purine stretch that could be acting as a polypurine tract (PPT)-like sequence. It has been reported that purine-rich sequences can act as PPT-like sequences for the RNase H enzyme from HIV-1 containing the RT domain (35).
Remarkably, it has been observed that the recombinant protein RH2L1Tc containing two in-frame copies of the RNase H domain cleaves the substrates more efficiently than the RHL1Tc protein. The higher activity of the dimeric protein could be caused by a special or more active protein conformation, or perhaps the enzyme binds the substrate as a dimer as has been reported for the MuLV enzyme (36). Further studies will be necessary for clarification of the role that the conformational structure of the L1Tc RNase protein plays in RNase H activity. Our discovery of RNase H enzymatic activity associated with the L1Tc RNase H domain should contribute to understanding the LINE retrotransposition mechanism and should reinforce the idea of the autonomy of such retroelements as they encode the enzymatic activities involved in their own transposition processes.