The Open Reading Frame 1 of the L1Tc Retrotransposon ofTrypanosoma cruzi Codes for a Protein with Apurinic-Apyrimidinic Nuclease Activity*

The deduced amino acid sequence of the open reading frame 1 (ORF1) of the L1Tc non-site-specific non-long terminal repeat retrotransposon of Trypanosoma cruzi exhibits a significant homology with the consensus sequence of the class II family of the endonuclease apurinic-apyrimidinic (AP) proteins. The analysis of the activity of the 40-kDa recombinant protein, named NL1Tc, obtained from the expression of the L1Tc ORF1 in an Escherichia coli “in vitro” expression system revealed that the sequence codes for a protein with endonuclease activity specific for apurinic-apyrimidinic (AP) sites. Data are also presented showing that in vivo expression of the NL1Tc protein conferred viability by complementation to E. coli exonuclease III deletion mutants (BW286 strain). We propose that the biological function of the AP endonuclease activity of the NL1Tc protein may be connected with the introduction into the DNA of free 3′ ends that could be used as primers for the integration, along the T. cruzigenome, of the L1Tc element and that the nicking could be a general mechanism for the retrotransposition of non-site-specific non-long terminal repeat retrotransposons.

Non-LTR 1 retrotransposons represent a particular family of transposable elements that are present in the majority of organisms from mammals to fungi. Although they may be considered as the oldest group of the retroelements present in eukaryotes (1), their biological role is still unknown. These elements are flanked by variable length target site duplications; they lack long terminal repeats (LTRs) and have variable length poly(A) or A-rich 3Ј tails (2). Given the critical functional role that the LTRs play in the retrotransposition of retrovirus and LTR retrotransposons, it is most likely that the non-LTR retrotransposons must use an integration system different from the double-stranded linear DNA integration system proposed for retroviruses and LTR elements (3). The non-LTR integration model, originally proposed by Schwarz-Sommer et al. (4), Finnegan (5), Bucheton (6), and developed by Eickbush (1), postulates that reverse transcription and the second strand synthesis must occur at the chromosomal insertion site using DNA free 3Ј ends as primers. In site-specific non-LTR elements that generate a specific length target site duplication, it is assumed that the free 3Ј ends in the chromosomal DNA are generated by the DNA-binding protein that, in addition, has an associated endonuclease activity (1). In fact, an endonuclease activity has been found to be coded by a site-specific non-LTR element such as the R2 from Bombyx mori (R2Bm) (7) that when expressed in Escherichia coli is able to specifically cleave the 28 S rRNA transcript (8). For integration into the genome, however, of non-site-specific non-LTR retrotransposons, the DNA-binding protein must simply associate with the presence of breaks already existing on the chromosome (1) since it has been assumed that the element does not have any associated endonuclease activity. Recently, we have described a Trypanosoma cruzi non-LTR retrotransposon, named L1Tc, that is distributed throughout the genome in a high copy number (9,10), which like the Trypanosoma brucei Ingi element has three ORFs. Two of the ORFs show respectively significant homology with the reverse transcriptase and cysteine motifs of the pol and gag genes from retrovirus and LTR retrotransposons. Interestingly, the analysis of the sequence of the 0RF1 revealed the existence of a significant homology with certain domains of the AP family of DNA repair enzymes (10). The existence of conserved domains between the AP family of proteins and certain non-LTR retrotransposons seems to be a general feature common to all non-site-specific elements (11). In the present paper we show that the protein coded by the ORF1 has endonuclease activity and that it is specific for apurinic/apyrimidinic sites. We present evidence indicating that the protein allows the survival of E. coli BW286 mutants lacking the Exo III activity. This evidence suggests that the AP activity coded by the L1Tc non-LTR retrotransposon of T. cruzi may play an essential function in the mechanism of integration of these types of elements at variable sites in the genome.

EXPERIMENTAL PROCEDURE
Molecular Cloning and Expression of the ORF1 of L1Tc-The oligonucleotides (5Ј-3Ј) used to amplify by polymerase chain reaction a 1.03-kilobase long DNA fragment were 5Ј-ACCAGCTCGAGCCATTTA-CAT and 3Ј-GCCTGGTACCGTCCTTGTGCA. For the polymerase chain reaction amplification 1 g of DNA from the pSPFM55 plasmid containing the L1Tc element (10) was used. 30 amplification rounds consisting of 94°C for 60 s, 65°C for 90 s, and 72°C for 90 s were employed. The initial cycle included a denaturation step for 3 min and an elongation cycle of 5 min at 72°C at the end of the reaction. The DNA fragment was cloned into the pTrcHisA vector (Stratagene) using AvaI and KpnI sites at 5Ј and at 3Ј ends, respectively. The AvaI and KpnI sites were generated ad hoc in the amplified fragment. The resulting plasmid was called pHisNL1Tc (Fig. 2A). The recombinant protein was expressed in the E. coli BW528 ((⌬xth-pnc), nfo1::kan) genotype strain which lacks endogenous Exo III and endonuclease IV (provided by B. Weiss, University of Michigan, Ann Arbor). The cells (500 ml, A 600 ϭ 0.6) transformed with the pHisNL1Tc and pTrcHisA vectors were in-duced with 2 mM IPTG for 3 h. To purify the recombinant protein the cultures were chilled at 4°C and centrifuged at 4000 ϫ g for 15 min at 4°C. The cells, suspended in a 10-ml buffer containing 50 mM sodium phosphate and 300 mM NaCl, pH ϭ 6, were incubated with 0.2 mg/ml lysozyme for 15 min, sonicated for 5 min, and centrifuged for 10 min at 12,000 ϫ g at 4°C. The supernatant was passed several times through a 0.45-m (Millipore) filter and then incubated with 750 l of Ni 2ϩ -NTA-agarose resin (Quiagen) containing 10 mM ␤-mercaptoethanol overnight at 4°C. It was then introduced into a column (5 mm in diameter) and washed several times with 50 mM H 2 PO 4 Na, pH ϭ 6, containing 300 mM NaCl and 30% glycerol. The recombinant protein was eluted with a imidazole gradient (0.05 to 0.5 M). The NL1Tc protein eluted at an imidazole concentration of 0.25-0.35 M in 2 ml. Then the sample was passed through a Mono S column in a fast protein liquid chromatography system (Pharmacia Biotech Inc.) with the following mobile phases: buffer A, 50 mM sodium acetate, pH ϭ 5; buffer B, 50 mM sodium acetate, pH ϭ 5, containing 1 M NaCl. Elution: 0 -100% gradient in 30 ml and a flow rate of 1 ml/min. The eluted protein was concentrated by Centriprep-10 and Centricon-10. The concentration of the protein present in the eluted fractions was determined by Bradford (12).
Nuclease Activity of the NL1Tc Protein-The AP endonuclease activity was assayed by addition of the purified recombinant protein to a short double chain DNA fragment containing a sole AP site. A 37-base oligonucleotide, a unique uracil residue, was 5Ј-32 P-labeled, to be used as a substrate in the aforementioned AP endonuclease assay. The oligonucleotide was incubated at 30°C overnight in 10 mM Tris-HCl, pH ϭ 8, 50 mM EDTA, containing 3 units of uracil/N-glycosylase. The assay was carried out following the methodology described by Sander and Huang (13). Likewise the endonuclease activity of the NL1Tc recombinant protein was analyzed on the partially depurinated supercoiled plasmid DNA and untreated plasmid DNA. The partially depurinated plasmid DNA was prepared by heat acid treatment as described by Oyo et al. (14). Supercoiled pUC8 plasmid DNA (23 g) in 120 l of 37.5 mM sodium citrate, pH ϭ 3.5, was incubated at 60°C for 15 min. After the incubation period the mixture was chilled to 0°C and dialyzed against 50 mM Tris-HCl, pH ϭ 7.5, for 3 h followed by distilled water. The dialyzed DNA solution was stored at Ϫ70°C. The analysis of the biochemical activity of the protein was assayed on 180 fmol of DNA suspended in 20 l of a 50 mM Tris-HCl, pH ϭ 7.5, buffer containing 50 g/ml bovine serum albumin and 5 mM MgCl 2 . The DNA was incubated for 30 min at 30°C with the 40-kDa purified protein. Then the reaction was stopped by chilling at 0°C. The conformational change of the DNA was analyzed by 0.8% agarose gel electrophoresis as described by Sander et al. (15). The Exo III from E. coli (Boehringer Mannhein) was used as control of AP nuclease activity in the presence of 200 mM NaCl, a condition which inhibits 3Ј-5Ј exonuclease activity (13).
Phylogenetic Trees-The phylogenetic tree was constructed using the TREECON program (17). The comparison was made using the conserved domains of 6 AP nucleases and the 17 non-LTR retrotransposon domains previously described by Martín et al. (11). The neighbor-joining method (18 -19) was used for construction of the tree. Distance matrices were calculated from multiple alignment using the percentage of amino acid identity values between the sequences. The distances were corrected for multiple substitutions using the Poisson process model. Confidence intervals were calculated using 200 bootstrap replicates.

AP-like Endonuclease Domains-
The comparison of the consensus sequence of the AP proteins described by Seki et al. (20) and the deduced amino acid sequence from ORF1 of L1Tc element showed that 30 of the L1Tc ORF1 amino acid residues, from the 103 residues given for the consensus sequence of the AP proteins, are identical amino acids. A detailed analysis of the sequence detected three domains (␣, ␤, and␥) that have a homology higher than 65% when compared with the same domains of the AP consensus sequence (Fig. 1). Since the domains that define the active sites of the Exo III (21) seem to be maintained in L1Tc ORF1, we hypothesized that L1Tc ORF1 may code for a protein that also has Exo III-like activity.
ORF1 of the L1Tc Element Codes for a Protein with Nuclease Activity-To determine whether the protein presumably coded in the ORF1 of the L1Tc element has any nuclease activity, the ORF was cloned into the AvaI and KpnI sites of the pTrcHisA expression vector (see "Experimental Procedures" and Fig. 2A). The protein fraction coded by the cloned fragment starts at the glutamic acid residue located downstream from the first methionine of L1Tc ORF1. We excluded the DNA sequence of L1Tc ORF1 coding for the first 49 amino acids from the cloned fragment since it has no homology with the consensus sequence of the AP proteins. The 3Ј end of the cloned fragment maps 31 nucleotides downstream from the stop codon of the ORF1. Thus, the resulting pHisNL1Tc plasmid contains an ATG initiation codon, a DNA sequence coding for six histidine residues, and the polymerase chain reaction DNA fragment from the ORF. The lacO signal allows overexpression of the recombinant protein due to IPTG induction. Fig. 2B shows the profile of the proteins expressed in an E. coli (strain BW528) expression system after transformation with the recombinant pHisNL1Tc vector. The protein profile of the pHisNL1Tc transformed bacteria after 2 mM IPTG incubation for 3 h revealed the presence of an intensely stained band of approximately 40 kDa which corresponds to the size of the NL1Tc recombinant protein (Fig.  2B, lane 1). Fig. 2B, lanes 2 and 3, shows the purified recombinant protein after passing through the affinity columns and cationic exchange, respectively, and Coomassie Blue staining.
The enzymatic AP activity of the purified protein was measured by specific hydrolysis of a double-stranded DNA fragment containing a unique AP site in residue 23. Hidrolysis of this fragment at the AP site would generate a labeled 22-mer fragment (Fig. 3A, lane 2). Likewise, the potential of the recombinant NL1Tc protein to generate nicked (relaxed) forms in supercoiled plasmids containing AP sites was analyzed. The supercoiled plasmids containing the AP sites were developed by heating supercoiled pUC8 DNA at pH 3.5, as indicated in under "Experimental Procedures." Fig. 3B shows that the 40-kDa purified protein is capable of nicking the partially depurinated supercoiled plasmids. As expected the supercoiled forms from non-depurinated plasmids remained supercoiled. As a further control of the experiment, we incubated partially depurinated plasmids with the fractions eluted in similar conditions to those described for the recombinant vector from extracts of E. coli BW528 transformed with the vector pTrcHisA plasmid. Fig. 3B, lane 5, shows that these fractions do not have any nicking activity.
The potential biological activity of the NL1Tc recombinant protein was revealed in complementation assays on lethal BW286, ⌬xth and dut-1 genotype, double mutant bacteria (16). The ⌬xth mutant has a deletion of the gene coding for the Exo III enzyme, and the dut-1 mutation is a temperature-induced mutation that inactivates expression of the deoxyuridine triphosphatase (dUTPase). The absence of dUTPase expression, an enzyme that catalyzes the step from dUTP to dUMP, would cause a very high and incorrect genomic incorporation of uracils during DNA replication. Thus, a high number of apyrimidinic sites would occur in the DNA due to the action of the uracil-DNA glycosylase enzyme. The double BW286 mutants are lethal because these apyrimidinic sites could not be eliminated due to the absence of the Exo III enzyme. The ⌬xth dut-1 strain when grown at 37°C has a Ͻ1% viability relative to that of the wild type. In this context transformation of BW286 mutants with a gene coding for an AP nuclease protein would confer viability to these bacteria. The results of the complementation assay, summarized in Table I, show that the pHisNL1Tc plasmid expressing the ORF1 of L1Tc is capable of maintaining the viability of the E. coli BW286 strain having the ⌬xth dut-1 genotype. The control BW286 strains of E. coli, genotypes ⌬xth dut-1 and ⌬xth transformed by the pTrcHisA plasmid, showed identical growth behavior to that of the non-transformed strains. It should be stressed that the same order of complementation was obtained when the experiment was done in the absence or the presence of IPTG. Thus, it is likely that in the absence of IPTG a sufficient amount of the protein is produced in the cell to support its growth.
Phylogeny-To investigate the possible sequence relationships between the 6 most fully described AP proteins and 17 non-site-specific non-LTR retrotransposons, phylogenetic trees were made using the conserved regions of these sequences (11). This analysis showed (Fig. 4) the existence of two major branches on the same tree. We observed that the phylogenetic relationship of the analyzed non-LTR retrotransposons shows no correlation with the phylogenetic relationship of the organ-  isms they inhabit (Fig. 4), although some elements from organisms classed in one family or order such as, for example, the L1 mammal elements are closely located. The L1Tc was found close to the Ingi and I factor and in a branch different from that of the APE family. DISCUSSION The present study shows that the 40-kDa NL1Tc recombinant protein, encoded by the ORF1 of the non-site-specific non-LTR retrotransposon element from T. cruzi, expressed in an E. coli system, has AP endonuclease activity since it is capable of hydrolyzing a 37-mer double-stranded DNA fragment containing an internal AP site and is capable of nicking supercoiled plasmids containing apurinic/apyrimidinic sites. The biochemical data confirmed our hypothesis (11) that the ORF1 was likely to be endowed with nuclease activity that was based on the alignment built between the deduced amino acid sequence of the ORF1 of L1Tc and the consensus sequence reported by Seki et al. (20) for the endonuclease enzyme class II. It was interesting to detect the existence of three domains in the NL1Tc deduced protein having high homology (higher than 65%) with the domains considered to contain the active site residues of the exonuclease III (21). The potential biological role of the NL1Tc protein was suggested by its ability to complement the exonuclease III enzyme repair activity of bacteria lacking the coding gene for this enzyme. The protein conferred viability to E. coli BW286 ⌬xth dut-1 mutants when grown at 37°C.
The phylogenetic analysis made by comparison of the conserved domains of the AP proteins and of those of non-sitespecific non-LTR retrotransposons showed the existence of two major branches. The L1 (L1hs, L1ms, L1mm, L1rn, and L1md), the cin4, and the Tad1-1 non-LTR retrotransposon elements are closer in evolution to the AP family of proteins than to the rest of the non-LTR retrotransposons, despite the fact that the AP family lack retrotransposition characteristics. Our data showed that the L1Tc element is close to the Ingi and the I factor elements and that it is located on a different branch relative to the AP family. We found that the non-site-specific non-LTR retrotransposons have a random distribution along the tree similar to the distribution reported by Xiong and Eickbush (22) for the non-LTR retrotransposon elements when the tree was based on reverse transcriptase domains. The random distribution obtained reinforces the hypothesis that the non-LTR retrotransposons either have spread horizontally among major taxonomic groups of organisms or that the origin of the non-LTR retrotransposons predates the evolution of metazoan (1).
Following the mechanisms proposed for integration of the non-site-specific non-LTR retrotransposons, we think that the AP endonuclease activity encoded by the ORF1 of the L1Tc element may well play a key role in the first stage of the transposition mechanism of the element by generating free 3Ј-OH sites in the chromosomal DNA where the integration of these elements would occur. Since most of the non-site-specific non-LTR retrotransposons have domains similar to those of the ORF1 of L1Tc, it is most likely that these elements may also be endowed with AP activity. The high number of potential AP sites that can be generated along the chromosomal DNA can explain the high copy numbers and dispersion of these elements through the genome of the organisms they inhabit (10,23,24). Recently, Feng et al. (25) reported that the human L1 ORF2 NH 2 terminus element that we have shown to have high homology with the T. cruzi L1TC and the nuclease family (11) has nuclease activity but that it shows no preference for AP sites. In this context, we believe that in T. cruzi random, already preformed, chromosomal DNA nicks postulated to be necessary for retrotransposition would not be required for the integration of these elements.