Identification and Expression of the TREX1 and TREX2 cDNA Sequences Encoding Mammalian 3 * 3 5 * Exonucleases*

The 3 * 3 5 * exonucleases catalyze the excision of nucle-oside monophosphates from the 3 * termini of DNA. We have identified the cDNA sequences encoding two 3 * 3 5 * exonucleases (TREX1 and TREX2) from mammalian cells. The TREX1 and TREX2 proteins are 304 and 236 amino acids in length, respectively. Analysis of the TREX1 and TREX2 sequences identifies three conserved motifs that likely generate the exonuclease active site in these enzymes. The specific amino acids in these three conserved motifs suggest that these mammalian exonucleases are most closely related to the proofreading exonucleases of the bacterial replicative DNA polymerases and the RNase T enzymes. Expression of TREX1 and TREX2 in Escherichia coli demonstrates that these recombinant proteins are active 3 * 3 5 * exonucleases. The recombinant TREX1 protein was purified, and exonuclease activity was measured using single-stranded, partial duplex, and mispaired oligonucleotide DNA substrates. The greatest activity of the TREX1 protein was detected using a partial duplex DNA containing five mispaired nucleotides at the 3 * terminus. No activity was detected using single-stranded RNA or an RNA-DNA partial duplex. Identification of the TREX1 and TREX2 cDNA sequences provides the genetic tools to investigate the physiological roles of these exonucleases in mammalian DNA replication, repair, and recombination pathways. The multistep in often the excision nucleotides the DNA 3 9 termini. For

The 335 exonucleases catalyze the excision of nucleoside monophosphates from the 3 termini of DNA. We have identified the cDNA sequences encoding two 335 exonucleases (TREX1 and TREX2) from mammalian cells. The TREX1 and TREX2 proteins are 304 and 236 amino acids in length, respectively. Analysis of the TREX1 and TREX2 sequences identifies three conserved motifs that likely generate the exonuclease active site in these enzymes. The specific amino acids in these three conserved motifs suggest that these mammalian exonucleases are most closely related to the proofreading exonucleases of the bacterial replicative DNA polymerases and the RNase T enzymes. Expression of TREX1 and TREX2 in Escherichia coli demonstrates that these recombinant proteins are active 335 exonucleases. The recombinant TREX1 protein was purified, and exonuclease activity was measured using single-stranded, partial duplex, and mispaired oligonucleotide DNA substrates. The greatest activity of the TREX1 protein was detected using a partial duplex DNA containing five mispaired nucleotides at the 3 terminus. No activity was detected using single-stranded RNA or an RNA-DNA partial duplex. Identification of the TREX1 and TREX2 cDNA sequences provides the genetic tools to investigate the physiological roles of these exonucleases in mammalian DNA replication, repair, and recombination pathways.
The multistep processes of DNA replication, repair, and recombination in human cells often require the excision of nucleotides from the DNA 3Ј termini. For each cell division 4 billion nucleotides must be correctly replicated. The polymerization of incorrect or structurally modified nucleotides into DNA generates the 3Ј termini that block chain elongation by the DNA polymerases. Oxidative damage to DNA can result in fragmented nucleotides at the 3Ј termini that can not be elongated by DNA polymerases. Genetic recombination and mismatch repair pathways can require the removal of normal nucleotides from the 3Ј termini of DNA chains. Enzymes containing 3Ј35Ј exonuclease activity remove these mismatched, modified, fragmented, and normal nucleotides to generate the appropriate 3Ј termini for subsequent steps in the DNA metabolic pathways.
Several 3Ј35Ј exonucleases have been described from a variety of animal cells (1)(2)(3)(4)(5). These exonucleases demonstrate similar biochemical properties, but the relationships between these enzymes are not known. Also, there are 3Ј35Ј exonucleases contained in the structural domains of mammalian DNA pols 1 ␦ (6), ⑀ (7), and ␥ (8). These proofreading 3Ј35Ј exonucleases excise incorrectly polymerized nucleotides during DNA synthesis. Thus, a variety of 3Ј35Ј exonucleases are present in mammalian cells. These exonucleases might function in multiple pathways to generate 3Ј termini that support further steps such as polymerization or ligation.
Excision of incorrectly polymerized nucleotides by proofreading 3Ј35Ј exonucleases is an important mechanism to minimize errors during DNA synthesis. The polymerase-associated proofreading exonuclease was first described using Escherichia coli DNA pol I (9) and further substantiated by the identification of viral, bacterial, and mammalian DNA polymerases with 3Ј35Ј exonuclease domains (10). A separate protein provides a proofreading function for the replicative DNA pol III in the Gram-negative bacteria (11). In these bacteria the exonuclease (⑀ subunit) and the polymerase (␣ subunit) are encoded by separate genes, and these subunits form a tight association within the polymerase III complex (12). The mammalian DNA pols ␣ and ␤ do not contain 3Ј35Ј exonucleases, and tightly associated proofreading exonucleases have not been unequivocally associated with these polymerases. However, the inefficient extension of mismatched 3Ј termini by the mammalian DNA pol ␣ leads to the suggestion that a separate proofreading exonuclease might function with this enzyme (13,14).
The incorporation of nucleotide analogs into DNA is a common strategy in antitumor and antiviral therapies. Polymerization of these analogs frequently generates 3Ј termini that are poorly extended by DNA polymerases (15). The proofreading exonucleases associated with the mammalian DNA polymerases have been shown to remove some nucleotide analogs, but the efficiency of this excision is poor (16 -19). In some cases, 3Ј35Ј exonucleases not associated with the mammalian DNA polymerases have been shown to remove nucleotide analogs from the 3Ј termini (4,5). The actions of 3Ј35Ј exonucleases would be expected to limit the amount of nucleotide analog that is incorporated into DNA and diminish the therapeutic potential of these drugs. The repair of oxidative DNA damage can require the removal of deoxyribose fragments from 3Ј termini. In E. coli, the 3Ј-excision activities of exonuclease III and endonuclease IV are believed to be responsible for removing the 3Ј-deoxyribose fragments that block DNA polymerase action (20). The APE protein is the eukaryotic homolog of exonuclease III (21), and this enzyme provides the major apurinic endonuclease activity in mammalian cells. However, unlike exonucle-ase III, the APE protein has only a modest 3Ј35Ј exonuclease activity, suggesting that additional repair exonucleases might exist in mammalian cells to remove 3Ј-blocking lesions (22).
Models of genetic recombination and mismatch repair suggest that normal nucleotides might be removed from the 3Ј termini of DNA by 3Ј35Ј exonucleases. The final steps of a genetic recombination process involving DNA reannealing might require a degradative repair step by a 3Ј35Ј exonuclease to remove unannealed DNA tails (23). In E. coli, a 3Ј-overhanging tail might be removed by exonuclease I. During DNA mismatch repair excision reactions can result in the removal of hundreds of nucleotides to generate a gapped DNA molecule for repair synthesis (24). When repair is directed from a position 3Ј to the mismatch, the 3Ј35Ј exonuclease activity of exonuclease I is required (25). Similar 3Ј35Ј exonucleases might be required in mammalian cells.
Detection of 3Ј35Ј exonuclease activities in mammalian cells has been relatively straightforward, but identification of the genes encoding these enzymes has been elusive. In a previous study, we identified a 30-kDa protein containing 3Ј35Ј exonuclease by activity gel analysis (19). We report here the identification and expression of the cDNA encoding this exonuclease. A second exonuclease was identified in data base searches. The two exonucleases have been named TREX1 and TREX2 to indicate the Three prime Repair EXonuclease activities of these enzymes. The TREX1 cDNA has also been identified in an independent study and referred to as DNase III (53). The sequences and properties of the TREX1 and TREX2 proteins suggest the involvement of these independent 3Ј-excision enzymes in the mammalian DNA replication and repair pathways. were used in a polymerase chain reaction to recover a fragment of the TREX1 bovine cDNA (TREX1b). The TREX1b fragment was cloned and sequenced using an automated DNA sequencer (Perkin-Elmer ABI Prism 377). The ESTs for the complete human TREX1 (TREX1h), mouse TREX1 (TREX1m), and mouse TREX2 (TREX2m) were identified in BLAST searches of the Gen-Bank TM dbEST data base. TREX1h (GenBank TM accession no. W24304), TREX1m (GenBank TM accession no. AA242227), and TREX2m (GenBank TM accession no. AA060540) were obtained from Genome Systems Inc. (St. Louis, MO). The TREX2h was identified on a COSMID clone (GenBank TM accession no. AF002998).
For expression in E. coli the TREX1 and TREX2 cDNA sequences were recovered from parent vectors using polymerase chain reaction and cloned into the pOXO4 T7 expression vector (26) to generate the pTREX1h, pTREX1m, and pTREX2m plasmids. A ribosome binding sequence AGGAGGT was included eight nucleotides 5Ј to the initiator ATG in the upstream oligomer. Expression of the TREX1 and TREX2 proteins was in E. coli BL21(DE3). Cells containing pTREX1 plasmids were grown to A 595 ϭ 0.2 in LB medium at 26°C, and isopropyl-1-thio-␤-D-galactopyranoside was added to a final concentration of 0.2 mM for 16 h. Cells containing pTREX2 were grown in M9 medium at 37°C. Cell extracts were prepared by sonication, and total protein in extracts was estimated by A 280 . The TREX1h and TREX1m proteins were purified from bacterial extracts essentially as described (4,19).
Exonuclease Reactions-Reactions contained 20 mM Tris-HCl, pH 7.5, 2 mM dithiothreitol, 5 mM MgCl 2 , 100 g/ml bovine serum albumin, 12.5 nM 5Ј-32 P-labeled oligomer as indicated in the figure legends, and 1 l of the appropriate enzyme dilution. Partial duplex DNA substrates were prepared by hybridizing a 20-mer or 21-mer to a 35-mer as described (27). The 35-mer⅐21-mer duplexes contain one to eight mispaired nucleotides at the 3Ј terminus. The RNA 10-mer, araCMPterminated 21-mer, and the complementary DNA templates have been described (19,28). Enzyme dilutions were prepared in 1 mg/ml bovine serum albumin at 4°C. Reactions were performed at 37°C for the indicated times and were quenched by the addition of 30 l of cold 95% ethanol. Samples were dried in vacuo and resuspended in 5 l of 95% formamide dye mixture. Samples were heated at 100°C for 3 min and subjected to electrophoresis on a 23% polyacrylamide denaturing gel. Radiolabeled bands were visualized and quantified by phosphorimagery (Molecular Dynamics). One unit of exonuclease is the amount of enzyme needed to degrade 1 pmol of DNA 3Ј termini in 1 min at 37°C.
Peptide Sequence Analysis-The 30-kDa bovine exonuclease was subjected to SDS-PAGE and stained with Coomassie Blue. The protein band (12 pmols) was excised from the acrylamide gel and digested with trypsin in situ; the recovered peptides were separated by high pressure liquid chromatography. Selected peptides were sequenced by Edman degradation in the W. M. Keck Foundation Biotechnology Resource Laboratory at Yale University (Ken Williams, Director).

RESULTS
Identification of the TREX1 and TREX2 cDNA Sequences-In previous work from this laboratory, a 3Ј35Ј exonuclease was purified from human myeloblastic leukemia cells as an activity that removes araCMP from the 3Ј termini of DNA (4). To obtain sufficient quantities of the 3Ј35Ј exonuclease for primary structural analysis, the same enzyme was purified from calf thymus tissue (19). The bovine exonuclease demonstrates the same DNA substrate specificity and physical properties as the human exonuclease (4). The bovine and human enzymes exhibit a native molecular mass of about 52 kDa by gel filtration analysis (4) (data not shown). The 3Ј35Ј exonuclease activities are associated with a 30-kDa protein when examined by SDS-PAGE activity gel analysis (19). Together these results suggest that the 3Ј35Ј exonuclease is a dimer composed of two identical subunits.
Purified bovine enzyme was used to generate tryptic peptides for primary sequence analysis. Four peptide sequences were obtained (Fig. 1). Degenerate oligomer primers were designed from two of the peptides, and polymerase chain reaction was used to recover a 201-base pair fragment of the bovine TREX1b cDNA. The nucleotide sequence of this 201-base pair fragment provided the deduced primary amino acid sequence of a contiguous 67-amino acid fragment of TREX1b (Fig. 1). Data base searches were performed using the DNA and deduced protein sequences of the TREX1b fragment to identify ESTs encoding the complete TREX1h and TREX1m cDNA sequences. The nucleotide sequence of the TREX1h cDNA confirmed the presence of a single complete open reading frame of 304 amino acids encoding a protein of 32,375 molecular weight (Fig. 1). Similarly, the TREX1m cDNA identified a protein of 304 amino acids with a calculated molecular weight of 32,629 (Fig. 1). Thus, the predicted size of the TREX1-encoded protein was in close agreement with the estimated size of the purified human and bovine exonucleases.
A second 3Ј35Ј exonuclease cDNA TREX2 was identified in the GenBank TM data base. The TREX2 sequence was detected in a BLAST search of the dbEST data base using TREX1 as the query sequence. The mouse TREX2m cDNA was found in the EST data base, but no EST was identified for the human TREX2h. The low abundance of TREX2m EST sequences and the absence of the TREX2h in the data base suggest that the TREX2 mRNA might be a rare or unstable mRNA in mammalian cells. The TREX2h sequence was identified as a single exon on a COSMID clone by searching the GenBank TM data base. The TREX2h and TREX2m nucleotide sequences encode proteins of 236 amino acids with molecular weights of 25,922 and 25,955, respectively. Alignments of the TREX1 and TREX2 sequences indicate that the two exonucleases are approximately 44% identical (Fig. 1). The COSMID clone containing the TREX2h sequence contains chromosomal DNA from the Xq28 region between markers DXS904 and BGN, 2 suggesting that the TREX2h gene maps to this position on the X-chromosome.
Analysis of the TREX Sequences-Three sequence motifs are present in the TREX proteins that identify amino acids likely positioned at the exonuclease active sites. Structural and mutagenesis studies in the exonuclease domains of the E. coli DNA pol I large fragment and the bacteriophage T4 DNA pol have identified specific amino acids that participate directly in the exonuclease catalyzed reaction (30 -35). These active site residues are present in three conserved motifs Exo I, II, and III that have been identified in nucleases from viruses to mammals (36 -38). The sequences of the TREX proteins indicate that an Exo I motif is positioned near the N terminus between amino acids 5 and 18 for TREX1 and between amino acids 11 and 24 for TREX2 (Fig. 1). An Exo II motif is present between amino acids 111 and 125 for TREX1 and between amino acids 114 and 128 for TREX2, and an Exo III motif is positioned nearest the C terminus between amino acids 182 and 196 for TREX1 and between 185 and 199 for TREX2. The positions of the Exo motifs in the TREX1 and TREX2 proteins are similar, and the amino acid sequences within these motifs are highly conserved. The TREX1 protein also contains an additional 68 amino acids at the C terminus that are not present in the TREX2 protein.
The TREX proteins were aligned with selected exonucleases to identify specific amino acids that might participate directly in the exonuclease reaction. Specific carboxylate residues within the Exo motifs of DNA pol I coordinate two metal ions that are critical in the hydrolytic mechanism of DNA cleavage by this exonuclease (30,31). Four conserved carboxylate residues in the Exo motifs of TREX1 and TREX2 were identified in this alignment (Fig. 2). The Asp and Glu in Exo I (Fig. 2,  labeled 1 and 2) and the Asp in Exo III (Fig. 2, labeled 4) align with the three carboxylate residues in DNA pol I that contribute to the binding of one metal ion required for activity. The Asp in Exo II (Fig. 2, labeled 3) aligns with the fourth carbox-ylate residue in DNA pol I that contributes to the binding of a second metal ion. The specific alignment of these carboxylate residues in the TREX proteins with the corresponding amino acids in the exonuclease domain of DNA pol I suggests that these residues contribute to metal binding in the TREX proteins. Activity assays demonstrate that a divalent metal is required for activity in the TREX1 and TREX2 exonucleases (4). The number of bound metal ions required for activity is not known. This sequence alignment suggests that the TREX proteins utilize a two-metal ion mechanism similar to that utilized by DNA pol I for phosphodiester bond cleavage.
The TREX proteins contain a HXAXXD sequence that is present in the Exo III motif of a limited number of exonucleases (Fig. 2). Alignment studies suggest that most exonucleases contain the YXXXD sequence in the Exo III motif (38). Structural and mutagenesis studies of DNA pol I suggest a role for this conserved Tyr at the active site (31,32). For T4 DNA pol mutations at this Tyr affect both the polymerase and the exonuclease activities (35,39). An alternative Exo III motif, named Exo III⑀, has been proposed based on mutagenesis studies of the exonuclease domain of Bacillus subtilis DNA pol III (40). Genetic studies of the ⑀ subunit of DNA pol III indicate that the HXAXXD sequence in the Exo III⑀ motif defines the active site residues in this region (41,42). The nucleases that contain the HXAXXD sequence include the ⑀ subunits of DNA pols III from Gram-negative bacteria, the exonuclease domains of DNA pols III from Gram-positive bacteria, and the RNases T (38,43,44). The significance of the HXAXXD sequence in the TREX proteins and in the proofreading exonucleases will require additional mutagenesis and structural information on these proteins.
The TREX1 and TREX2 cDNA Sequences Encode Active Exonucleases-The protein products of the TREX cDNA sequences were produced in E. coli to demonstrate the 3Ј35Ј exonuclease activities of the recombinant proteins. Extracts were prepared from cells containing the TREX1h, TREX1m, and TREX2m cDNA sequences, and the amount of 3Ј35Ј exonuclease was measured using a 5Ј-32 P-labeled 23-mer oligonucleotide (Table  I). In this assay, the total 3Ј35Ј exonuclease activity present in E. coli and any activity contributed by the recombinant protein are measured. The exonuclease activity detected in extracts of pTREX1h and pTREX1m transfected cells was about 40-and 17-fold greater, respectively, than in extracts of cells transfected with the control pOXO4 plasmid. Extracts prepared from pTREX2m transfected cells showed an 11-fold greater 3Ј35Ј exonuclease activity than the control cell extracts. To further confirm that increased exonuclease activity resulted from expression of the TREX proteins, the recombinant TREX1 proteins were purified from E. coli extracts. The recombinant TREX1 proteins exhibit chromatographic properties consistent with the exonucleases purified from human and bovine sources (4,19). Analysis of the purified recombinant TREX1m shows a major protein band corresponding to about 32 kDa by SDS-PAGE 3 (Fig. 3). These results show that the TREX1 and TREX2 sequences encode active 3Ј35Ј exonucleases and that these recombinant exonucleases can be expressed in E. coli.
The catalytic properties of the TREX1 enzymes are indistinguishable from the 3Ј35Ј exonuclease previously purified from human leukemia cells (4). To characterize the recombinant proteins, the exonuclease activities of TREX1h and TREX1m were compared with the previously purified 3Ј35Ј exonuclease. In a previous study it was demonstrated that the leukemia cell 3Ј35Ј exonuclease removes araCMP and dNMPs from the 3Ј terminus of a single-stranded oligomer and from the same oligomer hybridized to form duplex DNA (4). A direct comparison of purified TREX1h and TREX1m to the leukemia cell exonuclease demonstrates the same excision results (data not shown). In addition, ribonuclease activity was not detected in reactions containing an RNA 10-mer or the same RNA 10-mer hybridized to a DNA 40-mer using concentrations of enzymes 100-fold higher than used in the DNA excision experiments (data not shown). These results demonstrate that the TREX1 enzyme is the same exonuclease that was previously characterized in this laboratory.
The TREX1 Protein Prefers Mismatched 3Ј Termini-To fur-ther characterize the substrate specificity of TREX1, the preference for matched or mismatched 3Ј termini by the TREX1 enzyme was determined. The activity of the recombinant TREX1 enzyme was tested using a single-stranded oligomer and a series of partial duplex DNA substrates containing zero to eight mismatched nucleotides at the 3Ј terminus (Fig. 4). The rates of excision by the purified recombinant TREX1m were measured in seven time course reactions containing a 5Ј-32 Plabeled oligomer alone or hybridized to a second oligomer to form duplex DNA. The rate of excision of the 3Ј-terminal nucleotide is 2-fold greater using the single-stranded oligomer compared with the partial duplex with a matched 3Ј terminus (data not shown). The rates of excision of the 3Ј-terminal nucleotide increases as the number of mispaired nucleotides increases. The maximum excision rate was detected when five mispaired nucleotides were present at the 3Ј terminus. The excision rate using the substrate with five mispaired nucleotides was 6-fold greater than that using the paired substrate (Fig. 4). Increasing the number of mismatched nucleotides to eight results in a decrease in the excision rate (not shown). These results demonstrate that the TREX1 enzyme prefers mispaired 3Ј termini. However, the 3-fold greater rate observed using the five mispaired substrate compared with the singlestranded oligomer alone suggests that the TREX1 enzyme prefers a partial duplex structure rather than single-stranded DNA.

DISCUSSION
Two cDNA sequences, TREX1 and TREX2, have been identified that encode mammalian 3Ј35Ј exonucleases. The TREX1 enzyme was previously purified from human leukemia cells as a 3Ј35Ј exonuclease that removes araCMP from the 3Ј termini of DNA (4) and subsequently purified from calf thymus tissue (19). Recombinant proteins were produced from the TREX1 and TREX2 cDNA sequences in E. coli to verify the exonuclease activities. These exonucleases function independently from DNA polymerases and exhibit structural and biochemical characteristics that indicate a role for these enzymes in excision

TABLE I Expression of TREX proteins in E. coli
The strain BL21(DE3) containing the indicated plasmid was grown and treated with isopropyl-1-thio-␤-D-galactopyranoside. Extracts were prepared, and exonuclease reactions containing 100 ng of protein and a 5Ј-32 P-labeled 23-mer were performed. Products were separated on a polyacrylamide gel and quantified. reactions at the 3Ј termini of DNA during replication, repair, or recombination. The physiological roles for these enzymes have not been determined, but identification of the cDNA sequences provides the necessary tools for these further studies.
Identification of the TREX sequences provides genetic evidence for 3Ј35Ј exonucleases that function in mammalian cells. Biochemical evidence for the mammalian exonucleases has been available since the DNase III enzyme was first purified (1). Since the purification of Dnase III, several 3Ј35Ј exonucleases have been isolated from a variety of animal cells (2)(3)(4)(5)45). Differences in the biochemical properties of these enzymes have been used in attempts to distinguish these proteins. The catalytic properties of DNase III (1), DNase VII (2), and a "cytosolic" exonuclease (5) appear most similar to the TREX1 exonuclease (4). These enzymes degrade DNA only in the 3Ј35Ј direction, utilize both single-stranded or doublestranded DNA and not RNA, and require a divalent metal for activation. Gel filtration analysis indicates native molecular mass for DNase III and TREX1 at about 52 kDa, for the cytosolic exonuclease at about 50 kDa, and for the DNase VII enzyme at about 43 kDa. The DNase III and DNase VII enzymes degrade 3Ј-phosphoryl-terminated DNA and the TREX1 exonuclease does not. The products of DNase III digestion are both 5Ј-mononucleotides and dinucleotides. The products of digestion by the TREX1, DNase VII, and the cytosolic enzyme are exclusively 5Ј-mononucleotides. The TREX1 and cytosolic exonucleases remove nucleotide analogs from the DNA 3Ј termini. The reported biochemical properties of the mammalian exonucleases make it difficult to determine unequivocally their relatedness. The identification of additional genes encoding exonucleases will be required to resolve this issue.
The TREX1 cDNA sequence was initially identified using a peptide sequence generated from the most abundant 3Ј35Ј exonuclease purified from bovine tissue. Recently, peptide sequences generated from the rabbit DNase III were used to identify this cDNA (53). The TREX1 cDNA likely encodes the most abundant 3Ј35Ј exonuclease activity detected in a variety of mammalian cells. The TREX2 cDNA was identified in data base searches using the TREX1 sequence. The TREX1 and TREX2 protein sequences are similar with an overall amino acid identity of about 44%. The sequences within the Exo motifs of TREX1 and TREX2 are about 80% identical. A major difference between the TREX1 and TREX2 proteins is the C-terminal 68 amino acids present in TREX1 but not in TREX2. The function of the C terminus in TREX1 is not known. A possible nuclear localization sequence RRPK is identified at position 258 -261 in the TREX1m sequence (46). The presence of this putative nuclear targeting sequence in TREX1m suggests localization of this protein to the nucleus. However, the lack of identity between TREX1m and TREX1h at this four amino acid sequence raises questions about its function in nuclear localization. The TREX1 protein also contains a hydrophobic sequence at the C terminus between residues 277 and 294. This sequence might indicate a region of the TREX1 protein that interacts with the nuclear membrane with DNA or with other proteins.
Increasing evidence suggests that multiple 3Ј35Ј exonucleases function in mammalian cells. The TREX1 and TREX2 proteins are relatively small proteins containing independent exonucleases. Previously, exonuclease activities were identified as structural domains within larger proteins. These include the proofreading exonucleases associated with some mammalian DNA polymerases. Protein sequence alignment strategies have been developed to identify proteins that might contain exonuclease activities (37,47). These alignment studies indicated that the Werner's syndrome protein contains a possible exonuclease domain (47,48). Biochemical studies have confirmed the 3Ј35Ј exonuclease in the Werner's syndrome protein (49). Another study has shown that the p53 protein contains 3Ј35Ј exonuclease activity (50). These data suggest that several protein families might exist in mammalian cells that excise nucleotides from the 3Ј termini of DNA.
The TREX enzymes could be involved in several DNA metabolic pathways. Analysis of the primary sequence and biochemical properties of the proteins might suggest a role in exonucleolytic proofreading. However, to consider this possibility one must expand the currently accepted view of proofreading that has been mostly limited to the removal of misinserted nucleotides by DNA polymerase-associated exonucleases. It is possible that the TREX enzymes function as proofreading exonucleases at the DNA 3Ј termini without being physically associated with a DNA polymerase. The only known example of a separate protein serving as a proofreading exonuclease is the ⑀ subunit exonuclease proofreading for the ␣ subunit polymerase in the E. coli DNA pol III complex (12). Alone the ⑀ subunit excises the 3Ј termini efficiently but binds DNA poorly (51). The exonuclease activity of ⑀ is stimulated about 50-fold upon association with the ␣ subunit because of the increased binding affinity of the ␣⑀ complex for the DNA 3Ј termini (12). The TREX1 exonuclease excises the 3Ј termini efficiently and also binds single-stranded DNA tightly with a K m ϭ 4.3 nM (19). Perhaps the TREX enzyme binds the single-stranded DNA template ahead of the growing DNA chain and excises 3Јterminal nucleotides when the advancing DNA polymerase provides access to the 3Ј termini. This access might be achieved when chain elongation is kinetically blocked as a result of a 3Ј-terminal mismatch, damaged nucleotide, or nucleotide analog.
The TREX enzymes might play a role in genetic recombination or mismatch repair. Sequence analysis suggests that the TREX enzymes are closely related to the E. coli exonuclease I (47). In E. coli, it is proposed that exonuclease I excises unpaired single-stranded DNA tails during recombination (23) and generates repair patches by excising DNA in the 3Ј35Ј direction during mismatch repair (24). However, important distinctions in the biochemical properties of exonuclease I and TREX1 are apparent. The exonuclease I is a single strand specific exonuclease (52), and the TREX1 enzyme prefers a partial duplex DNA with multiple mispaired 3Ј termini. The exonuclease I is highly processive (29), and the TREX1 enzyme is not a highly processive exonuclease. Thus, although sequence comparisons provide a guide to the possible roles for the TREX enzymes, no obvious physiological function can be gleaned from the current available data.
The primary sequences of the TREX proteins provide some insights into the likely mechanism of these exonucleases. The   FIG. 4. TREX1 prefers mispaired 3 termini. Exonuclease reactions (50 l) were prepared containing a partial duplex with a paired 3Ј terminus (lanes 1-4) and with one mispair (lanes 5-8), two mispairs (lanes 9 -12), three mispairs (lanes [13][14][15][16], and five mispairs (lanes [17][18][19][20] at the 3Ј terminus. The purified TREX1m (5 pg) was added, and samples (10 l) were removed at the indicated times after incubation. The positions of migration of the oligomers are indicated. conserved carboxylate residues in the Exo I-III motifs suggest a mechanism similar to the two-metal ion based mechanism proposed for DNA pol I (30,31). The alignment of these residues and the requirement of a divalent metal for activity support the proposal for this type of mechanism in the TREX proteins. However, an important distinction is apparent from the HXAXXD sequence present in the Exo III motif of the TREX proteins. Sequence alignments suggest that a subfamily of exonucleases have the HXAXXD sequence as an alternative to the YXXXD sequence within the Exo III motif (Fig. 2). Mutations in the HXAXXD sequence inactivate the ⑀ subunit proofreading exonuclease of DNA pol III suggesting that these residues are important in the exonuclease active site (41,42). Structural studies of DNA pol I indicate that the Tyr in the YXXXD sequence of the Exo III motif interacts with a phosphate oxygen of the 3Ј-terminal nucleotide in the DNA substrate (31). In contrast, the equivalent Tyr in the Exo III motif of bacteriophage T4 DNA pol points away from the active site (35). Currently, structural data of exonucleases containing the HXAXXD sequence in the Exo III motif are not available to access the roles of the amino acids within this sequence.
The TREX cDNA sequences identified in this study are the first independent 3Ј35Ј exonucleases from mammalian cells. These exonucleases might remove 3Ј-terminal lesions or unpaired 3Ј termini that are generated during DNA replication, repair, or recombination. A comprehensive study of various DNA substrates will be required to understand the full spectrum of DNA substrates utilized by these exonucleases. The availability of the TREX gene sequences will make it possible to investigate the physiological role of these enzymes in human cells.