Crystal structure of the endonuclease domain encoded by the telomere-specific long interspersed nuclear element, TRAS1.

The telomere-specific long interspersed nuclear element, TRAS1, encodes an endonuclease domain, TRAS1-EN, which specifically cleaves the telomeric repeat targets (TTAGG)n of insects and (TTAGGG)n of vertebrates. To elucidate the sequence-specific recognition properties of TRAS1-EN, we determined the crystal structure at 2.4-A resolution. TRAS1-EN has a four-layered alpha/beta sandwich structure; its topology is similar to apurinic/apyrimidinic endonucleases, but the beta-hairpin (beta10-beta11) at the edge of the DNA-binding surface makes an extra loop that distinguishes TRAS1-EN from cellular apurinic/apyrimidinic endonucleases. A protein-DNA complex model suggests that the beta10-beta11 hairpin fits into the minor groove, enabling interaction with the telomeric repeats. Mutational studies of TRAS1-EN also indicated that the Asp-130 and beta10-beta11 hairpin structure are involved in specific recognition of telomeric repeats.

Non-long terminal repeat retrotransposons, also known as long interspersed nuclear elements (LINEs), 1 are transposable elements that encode a reverse transcriptase and insert into genomic locations via RNA intermediates. The recent progress of the human genome project has revealed that one LINE, L1, integrates throughout chromosomes and occupies more than 20% of the genome (1). The integration of L1 may play a role in genetic diseases and cancers (2) and in gene evolution and genome reconstruction (3)(4)(5).
The LINEs have been classified into two types according to the number of open reading frames (ORFs) (6). The first type of element has a single ORF and encodes an endonuclease domain near its C terminus; this type of endonuclease (also known as restriction enzyme-like endonuclease) is similar in some residual motifs to various prokaryotic restriction enzymes (7). The second type of element has two ORFs; ORF1 encodes a retroviral Gag-like protein whose function is still unclear, and ORF2 encodes a protein with an endonuclease domain at its N terminus and a reverse transcriptase domain in the center of the ORF. This class of endonuclease domain is made up of about 250 amino acid residues and shares sequence homology with apurinic/apyrimidinic endonucleases (APE), such as human APE1 and Escherichia coli exonuclease III.
Most of the APE-like endonuclease-encoding retrotransposons do not insert in a sequence-specific manner into the host genome, like the human L1 elements that cleave AT-rich sequences with a low sequence specificity (8). However, several endonuclease-encoding LINE have very restricted integration targets within the genome (9). TRAS1 and R1Bm, found in Bombyx mori, are the typical sequence-specific elements, which insert between T and A of the (TTAGG) n telomeric repeat and a specific site in 28 S rDNA of B. mori, respectively (10,11). Recent studies have shown that the endonuclease domain in the ORF2 of TRAS1 (12) and R1Bm (13) determines its own target sequence recognition and DNA cleavage activity.
The amino acid identity between human APE1 and the endonuclease domain of TRAS1 (TRAS1-EN) is not very high (11%), but in general the catalytic residues are well conserved between APE1 and LINE-ENs. Most of the APE-like endonucleases in LINEs, however, cannot cleave apurinic/apyrimidinic (AP) sites (8,14). In addition, it is hypothesized that LINE-ENs primarily cleave the bottom strand (e.g. TTAGG for TRAS1) followed by the top strand (CCTAA), supporting the target-primed reverse transcription model, which is a system unique to LINEs (13,(15)(16)(17). At present, however, little is known about the mechanisms of sequence recognition and digestion by endonuclease domains during the process of targetprimed reverse transcription. These features of the APE-like endonucleases of LINEs, which are quite different from those of the cellular AP endonucleases, are yet to be elucidated.
Here, we determined the crystal structure of TRAS1-EN at 2.4-Å resolution. The structure of TRAS1-EN forms a fourlayered ␣/␤-sandwich and its topology is similar to the cellular APEs studied so far. The ␤-hairpin formed by ␤ 10 -␤ 11 of TRAS1-EN is projected to the edge of the DNA-binding surface, which distinguishes it from the other known AP endonuclease structures. The mutational studies also suggested that this ␤-hairpin and Asp-130 are involved in telomere sequence specificity of TRAS1-EN. * This work was supported by grants from the Ministry of Education, Science, and Culture of Japan and by a grant-in-aid from the Research for the Future Program of the Japan Society for the Promotion Science. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The

EXPERIMENTAL PROCEDURES
Protein Purification and Crystallization-The TRAS1-EN expression vector (pHisT1EN) was constructed and transfected into E. coli BL21(DE3)/pLysS strain as described previously (16). The His 10 -tagged TRAS1-ENs were purified by Ni 2ϩ -bound HiTrap chelating column (Amersham Biosciences), followed by a Factor Xa digestion. The sample solution was further purified with cation and anion exchange column chromatography, resulting in highly purified TRAS1-EN. The protein was concentrated in sample buffer (50 mM Tris-Cl (pH 8.0), 250 mM NaCl, 5 mM dithiothreitol, 1 mM EDTA) to give a 12 mg/ml protein solution. Selenomethionine-substituted TRAS1-EN was expressed in strain B834pLysS(DE3) (Novagen) grown in LeMaster medium (18) containing 25 mg/liter L-selenomethionine; subsequent purification steps were the same as those for native TRAS1-EN.
Crystallization was carried out using the sitting drop vapor diffusion method. Crystals of TRAS1-EN were typically grown in 10 days from a drop consisting of 2.5 l of protein solution and 2.0 l of reservoir solution at 283 K. The reservoir solution contains 0.1 M sodium phosphate (pH 6.0), 2 M ammonium sulfate, 3% (v/v) isopropyl alcohol, 1 mM MgSO 4 , and 15-20% (v/v) glycerol.
Data Collection, Multiwavelength Anomalous Dispersion (MAD) Phasing, and Model Refinement-A single crystal was picked from the droplet with a nylon loop and flash-frozen in a cryostream at 100 K. The native and MAD data sets were collected at beamline BL-6B in Photon Factory (Tsukuba, Japan) and BL41XU in SPring-8 (Harima, Japan), respectively.
All data were indexed and integrated with MOSFLM (19), followed by scaling with SCALA (20). The crystals belonged to P3 2 space groups with unit cell dimensions of a ϭ b ϭ 64.6 Å, c ϭ 117.3 Å. The selenium site searching and the MAD phasing calculation were done by SOLVE (21) using 20 -2.5-Å resolution. 8 of 12 selenium sites were used for the phase calculation. The phases were determined to a mean figure of merit of 0.60. The density modification calculation and initial model building were employed automatically by RESOLVE (22), resulting in the mean figure of merit of 0.68. The native data showed a hemihedral twinning with twinning operator (k, h,-l) and twinning fraction of 0.184 (see Supplemental Data); therefore, the intensities of the native data were processed with DETWIN (20) before refinement calculation. The twinning tests on MAD data sets did not show twinning possibility. Data collection and MAD phasing statistics are summarized in Table I. Further model building was done with the program O (23). Model refinement calculations, including rigid body refinement, simulated annealing, and individual B-factor refinement, were done with CNS version 1.1 (24). After several rounds of refinement cycles, R cryst and R free (8% of the data) were dropped to 22.5 and 25.3%, respectively.
Mutagenesis of TRAS1-EN-The H237A mutant was constructed in a previous study (16). The point mutations were generated by a QuikChange site-directed mutagenesis kit (Stratagene) according to the manufacturer's instructions. The sequences of the primers used for the introduction of these mutations are available on request. The ⌬loop mutant, which deleted Ile 199 -Arg 200 and Lys 203 -Arg 204 (see "Results"), was constructed as follows. First, a DNA fragment from 3788 to 4397 bp of pHisT1EN was removed by NdeI-KpnI digestion (pHisT1EN ⌬3788 -4397), and then a PCR fragment was amplified by the primers S3788 (5Ј-AAAAACATATGCACGGCGAGCAGTGGAA-3Ј) and A4393 (5Ј-AA-AAAGGTACCCTCCTCTGATCGTATCAAATGTCGGGAC-3Ј). This fragment was digested by NdeI and KpnI (their recognition sites are underlined in S3788 and A4393, respectively) and subcloned into pHisT1EN ⌬3788 -4397, resulting in a 6-bp deletion from 4394 to 4399. Secondly, Lys-203 and Arg-204 were deleted by the inverse PCR using 5Ј-phosphorylated primers S4388 (5Ј-GGAGGGTACCAAAGCCGCGT-GGATGTG-3Ј) and A4381 (5Ј-CGTATCAAATGTCGGGACATCTC-CCTCGT-3Ј). The amplified product was then self-ligated, resulting in a 6-bp deletion from 4382 to 4387. The correct introduction of the mutation was confirmed by DNA sequencing.
Plasmid Nicking Study-TRAS1-EN for use in the nicking assay was expressed in the same way as the protein used for crystallization, except that the fractions from the HiTrap chelating column were directly dialyzed against storage buffer (50 mM Tris-Cl (pH 8.0), 500 mM NaCl, 20% glycerol, 10 mM 2-mercaptoethanol) using the 10-kDa Ultra- free-MC centrifugal filter unit (Millipore Corp.) to obtain the one-step purified TRAS1-EN protein with an N-terminal 22-amino acid tag. The silkworm telomeric repeat, (TTAGG) n , was amplified from the pGL3-Enhancer vector containing 51 telomeric repeats (25) and subcloned into pGEM T easy (Promega). The endonuclease activity of TRAS1-EN was assayed by incubating 1.0 g of supercoiled pGEM T easy-(TTAGG) 25 with 50 ng of TRAS1-EN protein for 40 min at 25°C. The reaction buffer contained 15 mM PIPES-Cl (pH 6.0), 10 mM NaCl, 10 mM MgCl 2 , and bovine serum albumin (100 g/ml). The reactions were electrophoresed on a 1.5% agarose gel and stained with ethidium bromide; the supercoiled, open circular, and linear DNA were quantified by densitometry. The nicking activities were evaluated as the proportion of supercoiled DNA that was converted into circular and linear form. Each value was obtained from five independent experiments.
Oligonucleotide Cleavage-32 P-labeled telomeric DNA was prepared as described previously (16). The 40-bp oligonucleotide containing 28 S rDNA sequence of the silkworm (5Ј-ACGAGATTCCCACTGTCCCTAT-CTACTATCTAGCGAAACC-3Ј) was also radiolabeled, annealed with the nonlabeled complementary strand, and gel-purified. 1 ng of the substrate DNAs was treated with 0.2 g of purified proteins in 50 mM PIPES-Cl (pH 6.0), 7.5 mM NaCl, 2 mM MgCl 2 , and bovine serum albumin (100 g/ml) in the 50-l volume at 25°C. The incubation was stopped by the addition of EDTA at a final concentration of 50 mM after 40 min. The resulting fragments were denatured at 95°C, ethanol-precipitated, and separated on a 30% polyacrylamide denaturing gel, together with the end-labeled oligonucleotides: 5Ј-TTAGGTTAGGTT-3Ј and 5Ј-TTAGGTTAGGTTAGGTT-3Ј for Fig. 5D. Quantification of the bands was carried out with BAS 5000 imaging analyzer system (Fujifilm). To determine the percentage of cleaved substrate, the radioactivity of the substrate before the reaction was divided by the intensity of the sum of the cleaved substrate after subtracting a lane background and that from products out of resolution. Significant differences were determined using Student's t test, with p Ͻ 0.05 considered to be significant.

TRAS1-EN Efficiently Digests Both (TTAGG) n and (TTAG-GG) n -Previous
studies have shown that TRAS1-EN first generates a nick between T and A of the (TTAGG) n bottom strand of insect telomeric repeats and then between C and T of the (CCTAA) n top strand ( Fig. 1A) (16). To confirm the sequence specificity of TRAS1-EN, its endonucleolytic activity was tested for various telomeric repeats from several organisms using double-stranded oligonucleotide substrates. TRA-S1-EN digested the (TTAGGG) 5 of the human-type telomeric repeats between T and A, producing ladder bands at intervals of exactly 6 bp (Fig. 1B). Compared with the (TTAGGG) 5 bottom strand, TRAS1-EN showed lower specificity for the (CCCTAA) 5 top strand. It also digested the 5Ј-(TTAGGC) 5 /5Ј-(GCCTAA) 5 repeats of Caenorhabditis elegans at two specific sites, although its specificity seemed lower. No obvious pattern of site-specific digestion was observed for the 5Ј-(TTTAGGG) 3 / 5Ј-(CCCTAAA) 3 repeats of Arabidopsis thaliana, the 5Ј-(TTTTAGGG) 3 /5Ј-(CCCTAAAA) 3 repeats of Chlamydomonas reinhardtii, or the 5Ј-(TGTGTGGG) 3 /5Ј-(CCCACACA) 3 repeats of Saccharomyces cerevisiae. These data demonstrate that TRAS1-EN is specific to the consecutive (GGTTAGG) sequence that is conserved in insect (TTAGG) n and human (TTAGGG) n but not in the other four telomeric sequences tested. In support of the above observations are previous findings that TRAS1-EN recognizes less than 10 bp surrounding the bottom strand FIG. 1. The telomeric repeat-specific LINE, TRAS1, and its endonuclease activity. A, schematic structure and integration process of TRAS1. ORF2 of TRAS1 encodes EN and reverse-transcriptase (RT) domains. The vertical lines near the C terminus of both ORFs represent the cysteine-histidine motifs. In the initial step of TRAS1 retrotransposition, EN makes a specific nick between T and A of the bottom strand (TTAGG) n and then between C and T of the top strand (CCTAA) n (I). By target-primed reverse transcription, TRAS1 inserts into the Bombyx telomeric repeats with its poly(A) tail facing toward the centromere (II). The shaded triangles indicates the telomeric repeats. B, nicking activity of TRAS1-EN for telomeric repeat sequences from various organisms. Single-stranded telomeric DNA corresponding to the bottom and top strands was end-labeled and annealed to the complementary nonlabeled strand. The resulting double-stranded DNA substrate was digested with 0.2 g of purified TRAS-EN protein (16), and the products were separated on 28% denaturing polyacrylamide gels. The end-labeled DNA substrates and reaction times (in minutes) at 25°C are indicated above the photographs. The intervals between the specific cleaved products are shown on the left for (TTAGG) n and (TTAGGG) n substrate.
cleavage site (7 bp upstream and 3 bp downstream) and that the GTTAG sequence (TTAGGTT2AGG, where 2 represents the cleavage site) is essential for the cleavage reaction (16).
Sequence Comparison of TRAS1-EN and Other LINE-ENs-To initially predict the regions within the endonuclease domains that might be responsible for catalytic activity and sequence-specific recognition, the endonuclease domain sequences of TRAS1 and various LINEs were aligned and compared with the human APE1 sequence (Fig. 2). TRAS3, which is a subfamily of TRAS found in B. mori, has the same integration site in the (TTAGG) n telomeric repeats as TRAS1 (29). SART1 is another telomeric repeat-specific element that integrates between T and A of the (CCTAA) n bottom strand (30), and R1Bm is a 28 S rDNA-specific element from B. mori (13,31). There are several amino acids that are conserved among all elements and are dispersed throughout the whole region, which seems essential for the endonucleolytic activity of APE-endonucleases. Some of these amino acids have already been shown to be essential for the activity of L1-EN (8).
Overall, however, the total amino acid sequence identity of the endonuclease domains is relatively low not only between TRAS1 and L1 (13%), but also even between the sequencespecific retroelements of B. mori (TRAS1/TRAS3, 58%; TRAS1/R1Bm, 27%; TRAS1/SART1, 25%). The sequence identity between APE1 and any LINE-EN is quite low (TRAS1/APE1, 11%). It is remarkable that the two regions, shown by thin gray lines in Fig. 2, are present in APE1 and other cellular AP endonucleases but are absent from LINE-ENs. Seven long amino acid tracts (Fig. 2, purple boxes) are highly conserved between TRAS1 and TRAS3, giving important information on the identity of the region responsible for specific recognition of the telomeric repeats.
The Overall Structure of TRAS1-EN-To investigate the particular features of TRAS1-EN that specifically recognize the telomeric repeats, we have expressed, purified, and crystallized TRAS1-EN (see "Experimental Procedures"). The native and MAD data were collected with synchrotron radiation; the resulting space group of the crystal was determined to be P3 2 with unit cell dimensions of a ϭ b ϭ 64.6 Å, c ϭ 117.3 Å. Finally, the crystal structure of TRAS1-EN was determined at 2.4-Å resolution with R cryst ϭ 22.5% (R free ϭ 25.3%) (Fig. 3). Two TRAS1-EN molecules per asymmetric unit were observed; the root mean square deviation between C␣ of two molecules was 0.70 Å. One molecule (denoted as chain A) contains a phosphate at the catalytic site, but the other (chain B) does not (Fig. 3B). The N-terminal 20 residues and part of the ␤ 2 -␤ 3 loop region (residues 59 -66) were not observed in either chain due to disorder, and part of the ␤ 10 -␤ 11 region was also disordered in chain A. Chains A and B interact with each other at both the ␤ 7 -␣ 2 and ␤ 8 -␣ 3 regions; although the two chains interact in the same regions, the interface is quite asymmetric.
TRAS1-EN forms a four-layered ␣/␤-sandwich; two ␤-sheets composed of six ␤-strands are packed face-to-face at the center by a hydrophobic interaction, and at the outer side, three long ␣-helices are located in parallel with inner ␤-strands. The arrangements of the two ␤-sheets are ␤ 13 -␤ 14 -␤ 1 -␤ 2 -␤ 4 -␤ 3 and FIG. 2. Amino acid sequence alignment of human APE1 and endonuclease domains from various non-long terminal repeat retrotransposons. Telomeric-repeat specific LINEs, TRAS1 and TRAS3, integrate into the same position of (TTAGG) n . SART1 represents another class of telomeric repeat-specific LINE, R1Bm is a 28 S rDNA-specific LINE, and L1 is a human non-sequence-specific LINE. The amino acids conserved among all elements are indicated by red letters. The relatively conserved amino acids (more than four matched in six sequences) are shown in boldface type. The dashes indicate gaps introduced to maximize homology. Seven regions highly conserved between TRAS1 and TRAS3 are shown in purple (29). In two regions, which are shown by gray lines under the sequences, LINEs have shorter amino acid tracts compared with that of APE1. Secondary structural information (␣1-3, ␤1-14, 3 10 A, and 3 10 B) obtained by crystal analyses is shown above the amino acid sequences. The amino acids altered in mutagenesis studies (Fig. 5) are shaded blue.
Comparison of TRAS1-EN and Human APE1 Structure-As previously predicted from the sequence homology ( Fig. 2) (8), the structural topology of TRAS1-EN is quite similar to that of AP endonucleases, such as human APE1 (32) and E. coli exonuclease III (33) or to that of the nonspecific endonuclease, DNase I (34). There are, however, variations in the lengths of the ␤-strands and loop regions; in particular, the loop between ␤ 9 and ␤ 12 , which forms an extra ␤-hairpin, is not seen in other known AP endonucleases (Fig. 4A). A superposition of 79 C␣ atoms of the central ␤-strands of TRAS1-EN with human APE1, exonuclease III, and DNase I gave root mean square deviations of 1.24, 1.27, and 1.62 Å, respectively.
The superposed model of TRAS1-EN with human APE1 clearly shows that two loops at the DNA-binding surface are shortened in TRAS1-EN compared with human APE1 (regions 1 and 2 in Fig. 4B). These two loops correspond with the two regions that are missing for LINE-ENs but found in APE1 and other cellular endonucleases (Fig. 2, gray lines). The ␤ 9 -␤ 12 loop of TRAS1-EN is similar to the corresponding loop of APE1 (residues 266 -279) in length, but the structural folding seems quite different; APE1 has a short ␣-helix in residues 273-276, which lies outside of the DNA-binding surface (region 3 in Fig. 4B).
The Catalytic Site of TRAS1-EN-Several mutagenesis stud-ies on human APE1 have revealed the essential residues for DNA-nicking activities (35)(36)(37)(38). The structural studies on free (32) and DNA-bound (39) human APE1 confirmed that residues Asn-68, Glu-96, Tyr-171, Asp-210, Phe-266, Asp-283, and His-309 compose the catalytic site (Fig. 4C). These residues interact with each other by hydrogen bonding, forming a catalytic site suitable for endonuclease activity. Particularly, His-309 and the divalent metal ion bound to Glu-96 are directly contacted to the cleavage site (39). The residues forming the catalytic site of TRAS1-EN, as predicted from the sequence alignment with human APE1, are clustered at the top side of the ␣/␤-sandwich (Fig. 4A). Fig. 2 shows that most of the essential amino acid residues in APE1 are conserved in endonuclease domains from other LINEs. In TRAS1-EN, Asn-29 (Asn-68 of APE1), Glu-56 (Glu-96), Tyr-126 (Tyr-171), Asp-157 (Asp-210), Asn-159 (Asn-212), Phe-197 (Phe-266), Asp-211 (Asp-283), and His-237 (His-309) are also conserved. In addition, comparison of the active sites shows that the spatial arrangements of these essential residues are also similar between TRAS1-EN and human APE1 (Fig. 4C). Despite adding MgSO 4 to the crystallization buffer, we did not observe Mg 2ϩ at Glu-56, which corresponds to the metal-bound residue (Glu-96) in the APE1 structure. Instead, residue Gln-31 is located between the Glu-56 and His-237 in TRAS1-EN, which seems to interrupt the metal binding to Glu-56. In APE1, Asp-70 is a corresponding residue to Gln-31. In the crystal structure obtained in this study, a phosphate ion was bound to putative catalytic residues Tyr-126, Asp-157, and His-237 in chain A, the same position as the DNA-backbone phosphate in the APE1/DNA structure (Fig. 4C).
We observed a water molecule inserted between His-237 and Asp-211 in the catalytic site of chain B (wat35 in Fig. 3A). The oxygen atom of wat35 is interacted with the N␦ atom of His-237 (distance: 2.6 Å), the O␦ atom of Asp-211 (2.5 Å), and an oxygen atom of another water molecule, wat31 (2.6 Å). Residues His-237 and Asp-211 correspond to the His-Asp pairing residues in other endonuclease family, which are critical constitutions for phosphate backbone breaking, and all of the previous reported AP endonuclease structures, human APE1 (32, 39), exonuclease III (33), and DNase I (34), do not contain water molecules between the His-Asp pair. We cannot address the issue of whether the inserted water molecule is critical to enzyme activity or not; however, it is indicated that the His-Asp pair is not so rigid that the O␦ atom of Asp can be replaced by a water.
Mutagenesis Studies of TRAS1-EN Reveal the Telomeric Repeat-specific Recognition Motif-Human APE1 is involved in the DNA base excision repair process and cleaves apurinic/ apyrimidinic sites (40), although it shows no sequence specific- and ⌬loop mutants digested the DNA into very short nucleotides (indicated by the boldface lines). E, double-stranded 28 S rDNA labeled with 32 P at the 5Ј-end of the noncoding strand was incubated with 0.2 g of the purified TRAS1-EN mutant protein. D130A and ⌬loop mutants digested the 28 S DNA into short fragments (indicated by the open boxes). F, nicking activities of TRAS1-EN mutants for specific or nonspecific substrates. TRAS1-EN mutant proteins were mixed with telomere or 28 S substrates, the nicking activities were quantified by measuring the radioactivity of substrate DNA, and the sum of the cleaved products is shown in the graph. The cleaved percentages were compared by t test. *, significant difference, p Ͻ 0.05. ity. The endonuclease domain of the human L1 element cleaves AT-rich regions with low sequence specificity (8). In contrast, TRAS1-EN digests the (TTAGG) n or (TTAGGG) n telomeric repeats in a highly specific manner (Fig. 1B) (16); hence, it should be possible to identify the region(s) in the TRAS1-EN structure that determine sequence specificity. In the putative DNA-binding surface, five loop regions (residues 29 -33, 76 -85, 127-132, 167-171, and 197-208) are present (Fig. 5A, indicated by cyan), with residues 197-208 forming a ␤-hairpin extending to the edge of the DNA-binding surface. The corresponding region in human APE1, residues 266 -286, bends outwards and does not form ␤-sheets (region 3 in Fig. 4B) (32,38).
To determine which residues of TRAS1-EN are involved in sequence recognition, we performed mutagenesis studies. First, we made the plasmid DNA substrate, which includes the B. mori telomeric repeat sequence in the pGEM-T vector. Then the endonuclease activities of the purified EN proteins were optimized by measuring the abilities to convert supercoiled plasmid containing (TTAGG) 42 into open circle or linear DNA (Fig. 5B). We have already shown that H237A mutants abolished the nicking activity for the telomeric repeat substrates (16). Compared with wild-type TRAS1-EN, E56A and H237A proteins showed about 10-fold reduction in the activity to relax the plasmid substrate (Fig. 5, B and C). Then, eight polar residues (Arg-32, Gln-79, Asp-81, Asp-130, Lys-131, Lys-168, Arg-201, and Lys-204) that are located at the DNA-binding surface were selected and substituted with alanine. Arg-201 and Lys-204 are located on the ␤ 10 -␤ 11 hairpin structure described above. Further, Arg-32, Lys-168, and Arg-201 were also substituted with asparagine, arginine, and histidine, respectively, referring to amino acid composition of human L1 or APE1 around this region (Fig. 2). Among them, five mutations, R32N, R32A, K168R, R201H, and R201A, decreased the nicking activity to 20% compared with wild-type TRAS1-EN, which is similar level to the loss-of-function mutants, E56A and H237A (Fig. 5, B and C). The nicking activities of K168A and K204A mutants were repressed to 40% of wild-type TRAS1-EN. On the other hand, the other four mutants, Q79A, D81A, D130A, and K131A, showed more than 80% of wild-type activity (Fig. 5, B and C).
Since this plasmid-nicking assay cannot detect cleavage sites, a 5Ј-end-labeled (TTAGG) 5 oligonucleotide was used as a substrate for the four active mutants described above (Fig. 5D). To investigate the involvement of the ␤ 10 -␤ 11 hairpin region in telomeric repeat-specific recognition, Ile-200-Arg-201 and Lys-203-Arg-204 were deleted, resulting in a short ␤-hairpin mutant of TRAS1-EN (⌬loop). In addition to the specific substrate, we also prepared the substrate containing a 40-bp sequence from 28 S rDNA of B. mori as a nonspecific DNA substrate for cleavage (Fig. 5E). This sequence is known as the target site of 28 S rDNA-specific LINE, R1Bm, which is closely related to TRAS1 phylogenetically (25,31). In the nicking assay of telomeric repeats by D81A and K131A mutants, we have observed ladder patterns at intervals of 5 bp as seen in wild-type TRAS1-EN (TRAS1-EN(WT)) ( Fig. 5D), suggesting that these mutations do not affect the telomeric repeat-specific recognition. However, Q79A, D130A, and ⌬loop mutants did not show the 5-bp interval ladders and lost their specific activities for the (TTAGG) 5 substrate. Especially, ⌬loop mutant cleaved the (TTAGG) 5 substrate in a nonsequence specific manner and produced abundant 4 -9-nucleotide fragments (represented by a thick vertical line in Fig. 5D). D130A also digested the substrate into the 7-9-bp fragments. In the use of 28 S rDNA substrate, a few products were detected in lanes of TRAS1-EN(WT) and the D81A mutant (Fig. 5E). The Q79A and K131A mutants did not show the nicking activities for ribosomal sub-strate. The digestion of 28 S rDNA substrate into smaller nucleotides was also observed in D130A and ⌬loop mutants (indicated by an open box in Fig. 5E). These observations indicate that D130A and ⌬loop mutants can cleave the target DNA nonspecifically, irrespective of the substrate sequence.
To investigate the target specificity of mutant proteins in more detail, we quantified the sum of the cleaved products in each lane and compared the cleavage efficiency for the telomere substrate with that for 28 S substrate (Fig. 5F). TRAS1-EN(WT) digested the telomere substrate, (TTAGG) 5 , and converted 15% of the substrate into smaller nucleotides, whereas only 4.3% of ribosomal sequence was cleaved. This suggests that TRAS1-EN(WT) showed higher selectivity for the specific (TTAGG) 5 substrate by 3.5-fold compared with the nonspecific 28 S substrate. As shown in Fig. 5D, D81A and K131A showed the specific nicking activities for the (TTAGG) 5 substrate, and these mutants also showed significant decreases in their activities for the nonspecific substrate (Fig. 5F). This indicates that Asp-81 and Lys-131 are involved in the cleavage reaction itself rather than the determination of the substrate specificity. The most remarkable observation in this experiment was that ⌬loop and D130A showed no apparent change in the nicking activity for the two substrates, telomeric and ribosomal ones (Fig. 5F), indicating that ⌬loop and D130A mutants lost their sequence specificities. Whereas Q79A showed 80% of wild-type activity in the plasmid assay, its activity was severely repressed for the oligonucleotide substrates (Fig. 5F). Q79A mutant might prefer the supercoiled DNA structure for the cleavage reaction. These results shown above suggest that at least Asp-130 and the ␤ 10 -␤ 11 hairpin are involved in specific recognition of the telomeric repeat structure.

DISCUSSION
Catalytic Sites of TRAS1-EN Deduced from the Crystal Structure-The AP endonucleases and LINE1-EN belong to the class of Mg 2ϩ -dependent endonucleases (41). The endonucleases in this family share relatively low (Ͻ20%) sequence identities, although the highly conserved catalytic residues and hydrophobic residues in the ␤-sheet regions suggest that these proteins have the same ␣/␤-sandwich structure.
Based on the crystal structure, we found that the spatial arrangements of highly conserved amino acid residues are the same in TRAS1-EN and human APE1 (Fig. 4C). In addition, the nicking activities of the H237A (16) and E56A mutants were lost completely (Fig. 5); both of these residues are con- served among all Mg 2ϩ -dependent endonucleases. These observations suggest that the highly conserved residues are also essential for TRAS1-EN activity.
Four mutants, R32N, R32A, R201H, and R201A, also showed a drastic decrease in nicking activity (Fig. 5, B and C). These results were unexpected but can be explained by the fine structure of the TRAS1-EN catalytic site; in TRAS1-EN, Arg-32 is located next to Glu-56, which is one of the essential residues. The N⑀ atom of Arg-32 interacts with the O⑀ atom of Glu-56. The N atom and the N⑀ atom of Arg-201 interact with the O␦ atom of Asp-236 and the O␥ atom of Thr-199, respectively. Both Thr-199 and Asp-236 are conserved residues among the LINE and AP endonucleases (Fig. 2); the substitution of Arg-32 and Arg-201 may therefore influence the structural formation of the catalytic site, leading to the loss of nicking activity.
The loss of nicking activity when Lys-168 was substituted to Arg was also an unexpected result, because both lysine and arginine have a positive charge and are often classified in the same amino acid category; in addition, Lys-168 does not interact with any atoms in the crystal structure. This result suggests that Lys-168 is somehow involved in DNA recognition or binding as well as in sequence recognition.
Recognition of Telomeric Repeats by TRAS1-EN-According to the crystal structure and functional analyses, we found that at least Asp-130 is involved in recognition of 5-bp repetitive telomere substrate. Especially, ␤-hairpin region is suggested to be responsible for the determination of the substrate specificity by TRAS1-EN. The predicted DNA-binding model of TRAS1-EN, based on the DNase I-DNA complex (42), has been constructed (Fig. 6). This model also implicates that TRAS1-EN attaches the DNA length with 6 bp upstream from the cleavage site, which corresponds with the previous study showing that recognition by TRAS1-EN extended from upstream 7 bp to downstream 3 bp around the cleavage site on the (TTAGG) n strand (16). When we applied the telomere DNA sequences to the TRAS1-EN/DNA model shown in Fig. 6, it is shown that Asp-130 is located near the 5Ј-TA-3Ј site (5Ј-TTAGG-TT2AGGTT-3Ј). This implicates that Asp-130 is involved in the recognition of the DNA backbone structure or bases around the cleavage site during the nicking reaction. As shown in Fig.  5, D-F, a deletion mutant ⌬loop completely lost its specific digestion patterns and the substrate selectivities. The ␤-hairpin of ␤ 10 -␤ 11 interacts with the minor groove of the DNA substrate, probably widening the minor groove and recognizing the sequence. Minor groove interaction and widening have also been suggested for human L1-EN (14).
TRAS3 is a LINE that shows the identical target specificity to TRAS1 (i.e. T-A junction of telomere) (29). Consistent with this, Asp-130 is conserved for TRAS3, and the amino acid composition of the ␤ 10 -␤ 11 region is fully identical between TRAS1 and TRAS3 (Fig. 2). These data strongly support the conclusion that these regions define the specificity for the (TTAGG) n sequence.
In this study, we have defined the basis of the region involved in sequence-specific recognition by TRAS1-EN (Fig. 6).
In an earlier report, we showed that the endonuclease domain of LINEs is the primary determinant of target selection in vivo and that swapping of endonuclease domains from two different LINEs can alter the target specificity (12). Thus, LINEs have the potential to serve as targeting vectors into the various genomic locations by swapping or manipulating the endonuclease domain. There are many sequence-specific LINEs that target specific sites within the genes of 28 S rRNA (reviewed in Ref. 43), 18 S rRNA, 5 S rRNA, small nuclear RNA, tRNA, transposons, and many repetitive elements (9, 44); therefore, endonuclease domains from these sequence-specific retroele-ments could be selected for integration into specific sites other than telomere. Moreover, we also report that TRAS1-EN effectively cleaves not only insect (TTAGG) n but also human (TTAGGG) n telomeric repeats; a chimeric L1 that includes TRAS1-EN may therefore integrate into the human telomere in a site-specific manner.
After this work was completed, the crystal structure of the endonuclease domain of human L1 (L1-EN) was recently reported, which forms the same four-layered ␣/␤-sandwich (28). The L1-EN structure shares a similar conformation of three regions on the DNA-binding surface as TRAS1-EN (Fig. 4B), which confirms our implications that these regions are specific for LINE-ENs and play important roles in target-primed reverse transcription.