Structure and Expression of the TREX1 and TREX2 3 * 3 5 * Exonuclease Genes*

The TREX1 and TREX2 genes encode mammalian 3 * 3 5 * exonucleases. Expression of the TREX genes in human cells was investigated using a reverse transcrip-tion-polymerase chain reaction strategy. Our results show that TREX1 and TREX2 are expressed in all tissues tested, providing direct evidence for the expression of these genes in human cells. Potential transcription start sites are identified for the TREX genes using rapid amplification of cDNA ends to recover the 5 * -flanking regions of the TREX transcripts. The 5 * -flanking sequences indicate transcription initiation from consensus putative promoters identified 2 140 and 2 650 base pairs upstream of the TREX1 open reading frame (ORF) and 2 623 and 2 753 base pairs upstream of the TREX2 ORF. Novel TREX1 and TREX2 cDNAs are identified that contain protein-coding sequences generated from exons positioned in genomic DNA up to 18 kilobases 5 * to the TREX1 ORF and up to 25 kilobases 5 * to the TREX2 ORF. These novel cDNAs and sequences in the GenBank y data base indicate that transcripts containing the TREX1 and TREX2 ORFs are produced using a variety of mechanisms that include alternate promoter us-age, alternative splicing, and varied sites for 3 * cleavage and polyadenylation. These initial studies have revealed previously unrecognized complexities in the structure and expression of the TREX1 and TREX2 genes. The multistep processes of DNA replication, repair, and ge-netic recombination often require the excision of 3 9 nucleotides to generate DNA 3 9 termini suitable for subsequent

The multistep processes of DNA replication, repair, and genetic recombination often require the excision of 3Ј nucleotides to generate DNA 3Ј termini suitable for subsequent metabolic steps. The apparent diversity of proteins containing 3Ј 3 5Ј exonuclease activity likely reflects the different requirements for these enzymes in the maintenance of the human genome. In some cases these exonucleases are found in large proteins that contain multiple catalytic and functional properties. The 3Ј 3 5Ј proofreading exonucleases are functional domains in the mammalian DNA polymerases ␦ (1), ⑀ (2), and ␥ (3). These proofreading enzymes remove incorrectly polymerized nucleotides during DNA synthesis and minimize the incorporation of mismatches into the genome. The Werner syndrome protein (WRN) contains a 3Ј 3 5Ј exonuclease activity in one functional domain and a 3Ј 3 5Ј DNA helicase activity in another (4,5). Deficiencies in the WRN protein increase genomic instability (6). The multifunctional p53 protein contains a 3Ј 3 5Ј exonuclease localized to the central core domain (7). This core region in p53 also contains the sequence-specific DNA binding domain that functions in cell-cycle checkpoint control in mammalian cells (8). The hRAD1 (Ustilago maydis REC1) and hRAD9 are human homologues of yeast DNA damage checkpoint response proteins (9). These proteins also contain 3Ј 3 5Ј exonuclease activities (10 -12). The yeast mre11 mutant is defective in recombinational DNA repair (13). The purified MRE11 protein (14,15) and a protein complex containing MRE11 contain 3Ј 3 5Ј exonuclease activities (16). The TREX1 and TREX2 proteins are relatively small dimeric proteins that contain potent 3Ј 3 5Ј exonucleases (17). 1 The presence of 3Ј excision activities in this apparently diverse collection of proteins, and likely others, probably reflects the multiple pathways present in human cells requiring the modification of DNA 3Ј termini. However, insufficient information is currently available to understand the molecular pathways in which the different 3Ј 3 5Ј exonucleases function.
Some insights into the catalytic requirements for 3Ј 3 5Ј exonucleases have been gleaned from protein structure and mutagenesis studies and from protein sequence analysis. The proofreading exonuclease domains of the Escherichia coli DNA polymerase I large fragment and the bacteriophage T4 DNA polymerase have nearly identical folding patterns despite minimal overall sequence identity (18 -20). Mutagenesis studies identify critical amino acids in three conserved motifs, Exo 2 I, Exo II, and Exo III, that are positioned to coordinate two metal ions at the active site (21)(22)(23). The proofreading exonucleases of the mammalian DNA polymerases also contain these three Exo motifs (24). Statistical modeling strategies have been developed to identify additional proteins that might contain 3Ј excision activity (25,26). This methodology revealed the conserved exonuclease motifs in the WRN protein (27,28), and biochemical analysis confirmed the 3Ј 3 5Ј exonuclease activity in this protein (4,29). The TREX sequences contain the Exo I and Exo II motifs and a variation in the Exo III motif, renamed Exo III⑀. The Exo III⑀ motif is characterized by the presence of the sequence HXAXXD rather than YXXXD (30 -32) and is detected in the RNase T subfamily of exonucleases (25,28,33). The Exo III⑀ motif in the TREX proteins suggests that these mammalian exonucleases most closely relate to the bacterial epsilon subunit of DNA polymerase III, exonuclease I, and the recently described exonuclease X (34).
The increasing number of 3Ј 3 5Ј exonuclease activities detected in proteins from human genes indicates that a variety of structural folds distinct from the proofreading exonucleases are likely. The multifunctional Escherichia coli exonuclease III has a potent 3Ј excision activity, and the structure of this protein is similar to APEX, the major human apurinic/apyrimidinic endonuclease (35,36). Conserved residues in these enzymes indicate a common catalytic mechanism involving a single metal ion. The 3Ј excision activity of the APEX protein is relatively weak and appears to be influenced by substrate and reaction conditions (37) as well as by the structure of the 3Ј terminal nucleotide (38,39). The structure of the p53 protein is similar to the exonuclease III and APEX proteins (40), and p53 protein is reported to contain 3Ј 3 5Ј exonuclease activity (7). The hRAD1(REC1) and hRAD9 recombinant proteins contain 3 Ј3 5Ј exonuclease activities, but extensive sequence and modeling analyses have not provided insights into the catalytic mechanisms of these proteins (41). Additional studies will be necessary to identify the complete repertoire of human genes encoding 3Ј 3 5Ј exonucleases.
The gene for TREX1 encodes the major 3Ј exonuclease activity measured in extracts prepared from mammalian cells. A 3Ј exonuclease activity was first detected in biochemical assays and named DNase III by T. Lindahl et al. (42). Recently, the human and mouse cDNAs encoding 3Ј 3 5Ј exonucleases were identified by sequencing peptides generated from the purified bovine (17) and rabbit (43) enzymes. A second closely related mouse cDNA, named Trex2, 3 was discovered in data base searches using the TREX1 cDNA as a query sequence (17). We have measured expression from both TREX genes using a RT-PCR strategy and investigated in detail the 5Ј-flanking regions of these genes. The human and mouse TREX1 proteins are 314 amino acids in length and not 304 as previously reported (17,43). Our analysis confirms expression of TREX1 and provides the first evidence for the expression of TREX2 in human cells. Novel cDNAs containing the TREX1 and TREX2 ORFs have been identified that contain exons spanning 18 kb for TREX1 and 25 kb for TREX2. The salient features of the TREX genes are presented in this report.

MATERIALS AND METHODS
DNAs-Oligonucleotide primers were synthesized in the DNA laboratory of the Wake Forest University Comprehensive Cancer Center and are listed in Table I. The bovine genomic DNA was from Sigma-Aldrich Co. (D1501). The mouse genomic DNA (strain 129 SV) was a generous gift from P. Dawson (Wake Forest University School of Medicine). The human genomic DNA was the BAC clone RP11-24C3 (no. AC021328) purchased from Research Genetics.
PCR of the TREX1 ORF from Genomic DNA-For amplification of TREX1 from genomic DNA, the PCRs (100 l) contained 10 mM Tris-HCl, pH 9.0, 50 mM KCl, 0.1% Triton X-100, 200 M dNTPs, 1.5 mM MgCl 2 , 50 ng of genomic DNA, and 1 M each of the forward and reverse primers ( Table I). The TREX2 PCRs also contained 5% Me 2 SO. Reactions were heated to 95°C for 5 min prior to addition of Taq DNA polymerase (2.5 units, Promega Corp.) at 80°C. The reactions were performed for 35 cycles at 95°C for 1 min, 60°C for 1 min, and 72°C for 2 min. The products were resolved by agarose gel electrophoresis, recovered from the ethidium bromide-stained gels using spin columns (Qiagen), and sequenced using a PerkinElmer Life Sciences ABI Prism 377 automated DNA sequencer.
RACE Analysis of the 5Ј-Flanking Regions of TREX1 and TREX2-Marathon ready cDNA from spleen (CLONTECH Laboratories, Inc.) was used to recover the 5Ј-flanking regions of TREX1 and TREX2 cDNAs. The two-round PCRs were performed according to the manufacturer's specifications using the nested Marathon Adapter primer pair and the TREX1-and TREX2-specific primer pairs indicated in Figs. 2, 4, 6, and 7. The TREX2 PCRs also contained 5% Me 2 SO. The first-round PCR products were fractionated on agarose gels and recov-ered from the gels using spin columns (Qiagen) in three separate sizeselected pools. Samples of the size-selected products were used as templates in the second round PCR. Distinct product bands were recovered from gels, cloned into the pGEM ® -T Easy vector (Promega Corp.), and sequenced.
Expression Analysis by PCR of the TREX1 and TREX2 Transcripts-Total RNA was recovered from blast cells of a patient diagnosed with acute myeloblastic leukemia (AML) by guanidine isothiocyanate extraction and cesium centrifugation (44). The AML RNA was treated with RNase-free DNase I (Promega Corp.) and further purified using a RNeasy column (Qiagen). The AML RNA or tissue specific RNA (CLONTECH Laboratories, Inc.) (5 g) was hybridized to 0.5 g of oligo(dT) 15 primer (Promega Corp.) for 5 min at 95°C and then 10 min at 70°C. The RNA was reverse transcribed with SuperScript II (Life Technologies, Inc.) for 2 h at 42°C to generate cDNA. The PCR conditions using AML cDNA and the tissue-specific cDNA were as described above for genomic DNA. The nested primer pairs for the two-round PCR of AML cDNA are described in Figs. 3 and 5. The template for the first round PCR was 200 ng of reverse transcribed AML RNA, and the template for the second round was a sample (1 l) of the first-round PCR products. The products from the second round were resolved by agarose gel electrophoresis, recovered from the gel, and sequenced. A single-round PCR was performed for TREX1 and TREX2 expression analysis using the 13 human tissue-specific cDNAs and the primers indicated in the text.
Identification of Novel TREX1 and TREX2 Transcripts-Potential exons encoded in the genomic DNA of the 5Ј-flanking regions of the TREX1 and TREX2 ORFs were identified using the gene-finding algorithm GENSCAN (47). The PCR conditions for amplification of novel TREX1 and TREX2 cDNAs were as described above for genomic DNA. The specific primer pairs used in the two round amplification reactions are indicated in the text and in Figs. 6 and 7. The products from the second round were resolved by agarose gel electrophoresis, recovered from the gel, and sequenced.

RESULTS AND DISCUSSION
The peptide sequences generated from a purified mammalian 3Ј 3 5Ј exonuclease identified the human TREX1 cDNA from EST W24304 in the GenBank data base in two independent studies (17,43). More recently, we used the TREX1 sequence (no. AF151105) in a BLAST search of the GenBank data base to identify additional TREX1 ESTs (i.e. no. BE616406, AV764291, R23917, AA279657) and the human BAC clone RP11-24C3 (no. AC021328). Sequence alignments of these TREX1 ESTs indicated variations in the 5Ј-flanking regions (data not shown). These sequence variations prompted the systematic analysis presented in this work of the TREX1 cDNAs and the TREX1 genomic sequence in the human BAC clone RP11-24C3.
The TREX1 ORF-The genomic DNA sequences flanking the mouse, bovine, and human TREX1 genes were examined to confirm the single ORF structure of this gene. Initial studies of human and mouse TREX1 cDNA sequences identified a common ATG codon positioned near the 5Ј end of the TREX1 ORFs (17). The recombinant proteins produced from the human and mouse TREX1 cDNAs using this ATG as a start codon generated active 3Ј 3 5Ј exonucleases. However, mouse TREX1 ESTs (i.e. no. AI182180, BF577448, AA197643) contain a second in-frame ATG codon 30 nucleotides upstream raising the possibility that the initiating methionine in the TREX1 ORF had not been identified. To identify the initiating Met for TREX1, genomic DNAs positioned at the 5Ј end of the mouse, bovine, and human TREX1 ORFs were recovered using PCR, and the nucleotide sequences of these PCR products were determined. Primer pairs used in these reactions were designed from the TREX1 ESTs indicated in Fig. 1 and the available sequences in the GenBank dbEST data base (Fig. 1, Table I). The lengths of the genomic DNA fragments recovered from the PCRs were 1114 bp (mouse), 507 bp (bovine), and 532 bp (human). Alignments of the ESTs with the recovered genomic sequences identified consensus intron donor and acceptor sequences and indicated that an RNA splicing process modified 3 The discovery of a second gene closely related to the gene for TREX1/DNase III necessitated the renaming of DNase III (gene DRN) to TREX1. The TREX designation provides a unique symbol for the two closely related 3Ј exonuclease genes and is used in other species. The murine orthologs of these genes have the approved symbols of Trex1 and Trex2. The TREX name reflects the biochemical activity and likely biological role as three prime repair exonucleases. the 5Ј-flanking regions of the TREX1 transcripts (data not shown). The deduced amino acid sequences were determined from the genomic DNA sequences and aligned using ClustalW to determine the relative identity at the 5Ј ends of the TREX1 ORFs ( Fig. 1). The alignment shows that the Met labeled 1 is the only Met conserved in all three mammalian sequences, indicating that this is the initiating Met of the TREX1 ORF. No sequence identity is detected prior to the proposed initiating Met. Additionally, the translated genomic sequences for mouse and bovine genomic DNA contain in-frame stop codons at positions Ϫ28 and Ϫ40, providing further support for the assignment of the initiating Met for TREX1 (Fig. 1). The translated human genomic sequence has two additional in-frame Met at positions Ϫ38 and Ϫ55. The significance of these potential Met codons is currently unknown. The human and bovine TREX1 sequences at the initiating ATG are identical to the Kozak consensus sequence (45), and the mouse sequence differs at a single position (Fig. 1). Additional PCRs of genomic DNA have confirmed the single ORF structures of TREX1 in the mouse, bovine, and human genomes (data not shown). The human and mouse TREX1 ORFs indicate a coding region of 314 amino acids, and the bovine TREX1 ORF is 315 amino acids in length. A Drosophila TREX homolog (no. AE003581) encodes a protein of 351 amino acids and contains two exons. The homologous relationship between the mammalian and Drosophila genes is apparent by computational analysis using the COGNITOR program (46). The products of these genes fit into the same cluster of orthologous groups of proteins represented by the E. coli DNA polymerase III-⑀ subunit. Although the biochemical relationship between these proteins is very likely to catalyze the removal of nucleotides from DNA 3Ј termini, the evolutionary relationship between these genes, the cellular functions, and the three-dimensional structures are not known.
The 5Ј-Flanking Region of TREX1 Transcripts-The 5Јflanking region of TREX1 cDNAs was examined using a 5Ј-RACE procedure. A two-round PCR was designed using spleen cDNA with the nested TREX1-specific reverse primers (T1 rv1 and T1 rv2 ) and the cDNA adapter primers. Seven independent clones ranging from 133 to 612 bp in length were recovered, and these sequences were aligned with the TREX1 genomic sequence (Fig. 2). To identify genomic sequences in the 5Јflanking region of TREX1 that might serve as transcription initiation sites, the sequence analysis Neural Network Promoter Prediction (NNPP) program was used. Two potential promoters positioned Ϫ140 and Ϫ650 bp from the TREX1 ORF are identified (Fig. 2). The 5Ј ends of four TREX1 cDNAs (Fig.  2, labeled 3-6) align with the genomic sequence at positions, indicating transcription initiation at the Ϫ650 consensus putative promoter sequence. Two of the cDNAs (Fig. 2, labeled 1 and 2) align at positions indicating transcription initiation at the Ϫ140 or the Ϫ650 putative promoter sequences. In addition, the 5Ј end of another cDNA (Fig. 2, labeled 7) was positioned 5Ј to both predicted promoter sequences, indicating additional or alternative promoters are present in the 5-flanking region of TREX1. Two of the cDNAs (Fig. 2, labeled 6 and 7) were spliced at consensus intron donor and acceptor sequences that had been previously identified in human ESTs, providing further support for a RNA splicing modification of the 5Ј-flanking region of TREX1 transcripts.
Splicing of the TREX1 Transcripts and Expression in Human Tissues-The TREX1 cDNA sequences recovered in the 5Ј-RACE analysis were compared with the 5Ј-flanking regions of TREX1 ESTs available in the GenBank data base. These sequences indicated the presence of one intron donor sequence and two acceptor sequences (Fig. 3). Thus, in addition to the unprocessed TREX1 transcript, two splicing pathways were possible for processing of the TREX1 transcripts. It was predicted that splicing from the donor site to acceptor site A would generate a transcript encoding the complete TREX1 ORF, whereas splicing to acceptor site B would generate a transcript that lacks necessary TREX1 sequence to encode an active TREX1 protein (Fig. 3A). It is possible that these alternatively spliced TREX1 transcripts reveal a pathway for regulation of TREX1 by alteration of the mRNA stability or translation efficiency. A two-round PCR was designed to estimate the relative abundance of the three possible TREX1 transcripts using reverse transcribed RNA from AML cells. The TREX1 cDNAs were amplified using the nested TREX1 ORF-specific primers (T1 rv1 and T1 rv2 ) and the nested 5Ј-flanking region primers (T1 fr1 and T1 fr2 ). The three possible TREX1 transcripts were detected by agarose gel electrophoresis of the PCR products (Fig. 3B). Sequencing of the cloned products confirmed the identity of the least abundant 532-bp band as the product of the unspliced TREX1 transcript. The 212-and 102-bp bands resulted from amplification of the two spliced TREX1 transcripts. The most abundant band is the 212-bp product generated by splicing from the donor site to acceptor site A positioned 26 base pairs 5Ј to the predicted initiating methionine codon. These data indicate that the most abundant TREX1 transcripts in AML cells, initiating at the Ϫ650 putative promoter, are processed by an RNA splicing mechanism that removes a 320-bp intron from the 5Ј-flanking region of the TREX1 transcripts. Analysis of mouse and bovine TREX1 ESTs in the data base reveal a similar RNA splicing pathway to conserved acceptor sites positioned at Ϫ7 bp in mouse and Ϫ21 bp in bovine prior to the initiating ATG codons, suggesting conservation of this mechanism between mammalian species.
The donor site to acceptor site A pathway is the predominate pathway for processing of TREX1 transcripts in various human tissues. To determine the pattern of TREX1 gene expression in human cells, a single-round RT-PCR experiment was performed using the T1 fr2 and T1 rv1 primers and reverse transcribed total RNA from 13 different tissues (Fig. 3). The three PCR products generated from TREX1 transcripts are detected FIG. 2. A 5-RACE analysis of the TREX1 ORF. The TREX1 cDNAs (1-7) were recovered by 5Ј-RACE, and the sequences were aligned with the TREX1 genomic sequence (filled box and solid line). The alignment identifies sequences present in the cDNAs and genomic DNA (dotted lines) and sequences removed from cDNAs by RNA splicing (solid, bent lines). The 5Ј end positions (arrows), the NNPP-predicted promoters (Ϫ650 and Ϫ140), and the TREX1-specific primers (T1 rv1 and T1 rv2 ) are indicated. Hu, human.
CGTGTCCAGGGCGGCTGT in all tissues tested (Fig. 3C). In addition, the relative ratios of the three PCR products are similar in each tissue, with the 212-bp product being the most abundant in all tissues. Some quantitative differences in the staining intensities of the 212-bp products are apparent, suggesting variability in expression between tissue samples. These data suggest that TREX1 expression in spleen, prostate, thymus, and AML cells is higher than that in heart, skeletal muscle, and bone marrow. A more quantitative analysis will be necessary to substantiate these variations in expression, but ubiquitous expression of TREX1 is clear from these results. The 5Ј-Flanking Region of TREX2 Transcripts-In previous work from this laboratory, an active 3Ј 3 5Ј exonuclease was generated from a single mouse Trex2 EST identified in the GenBank data base (17). To date only Trex2 ESTs from mouse have been deposited in the GenBank data base. A PCR strategy was developed to identify human TREX2 cDNAs and to investigate the 5Ј-flanking region of these cDNAs using a The PCR products (643, 130, and 99 bp) are predicted using the indicated nested primer pairs (T2 fr1 , T2 fr2 and T2 rv1 , T2 rv2 ). Agarose gel electrophoresis of the PCR products generated using reverse transcribed AML RNA (B, lane 2) indicates the presence of all three TREX2 transcripts. Lane 1 contains DNA size standards. Total RNA from various human tissues was subjected to RT-PCR using the TREX2-specific T2 fr2 and T2 rv2 primers. Agarose gel electrophoresis of the PCR products (C) indicates the presence of only the 643-bp unspliced TREX2 transcript in all tissues. Hu, human. 5Ј-RACE procedure. A two-round PCR was designed using spleen cDNA with nested TREX2-specific reverse primers (T2 rv1 and T2 rv2 ) and the cDNA adapter primers. Six independent clones ranging in length from 95 to 959 bp were recovered, sequenced, and aligned with the TREX2 genomic sequence (Fig. 4). The genomic sequence in the 5Ј-flanking region of TREX2 was examined using the NNPP program to identify potential transcription initiation sites. Two potential promoters positioned Ϫ623 and Ϫ753 bp from the TREX2 ORF were identified (Fig. 4). The 5Ј ends of five TREX2 cDNAs (Fig. 4, labeled 1-5) align with the genomic sequence at positions, indicating transcription initiation at one of these consensus putative promoter sequences. The 5Ј end of another cDNA (Fig.  4, labeled 6) was positioned 5Ј to both predicted promoters, indicating additional or alternative promoters are present upstream in the 5Ј-flanking region of TREX2.
Splicing of the TREX2 Transcripts and Expression in Human Tissues-The genomic DNA sequence in the 5Ј-flanking region of the human TREX2 was examined for possible splice donor and acceptor sites. One potential intron donor sequence and two potential acceptor sequences were identified, suggesting the possibility for a processing pathway for TREX2 transcripts similar to that for TREX1 transcripts (Fig. 5A). Thus, like the TREX1 transcripts, two splicing pathways are possible for processing of the TREX2 transcripts. However, unlike processing of a TREX1 transcript, splicing from the donor site to acceptor site A or to acceptor site B generates a TREX2 tran-script encoding the complete ORF. A two-round PCR was performed using the nested TREX2 ORF-specific primers (T2 rv1 and T2 rv2 ) and the nested 5Ј-flanking region primers (T2 fr1 and T2 fr2 ) with AML cDNA to recover TREX2 cDNAs. The three possible TREX2 transcripts are detected upon agarose gel electrophoresis of the PCR products (Fig. 5B). Sequencing of the cloned products confirmed the identity of the most abundant 643-bp band as the product of the unspliced TREX2 transcript. The 130-and 99-bp bands resulted from amplification of the two spliced TREX2 transcripts. The relative intensity of the bands likely reflects the relative abundance of the TREX2 transcripts in AML cells with the unspliced TREX2 transcript being the predominant transcript. These results provide the first evidence for expression of the TREX2 gene in human cells and identify an RNA splicing process that removes a 513-or a 544-bp intron from the 5-flanking region of the TREX2 transcripts. The sequence of the mouse Trex2 EST (no. AA060540) indicates splicing in the 5Ј-flanking region from the donor site to acceptor site B to remove a 623-bp intron, indicating conservation of this RNA splicing mechanism between mammalian species (data not shown).
An experiment was designed to measure the gene expression pattern for TREX2 in human cells. The pattern of TREX2 gene expression was determined in a series of RT-PCRs using the T2 fr2 and T2 rv2 primers and reverse transcribed total RNA from 13 different tissues (Fig. 5). The PCR product generated from the unspliced TREX2 transcript was detected in all tis- sues tested (Fig. 5C). The spliced TREX2 transcripts were not detected in this single-round PCR (data not shown), indicating that the unspliced form of the TREX2 transcript is the most abundant in all tissues tested. The staining intensities of the 643-bp products generated from thymus and spleen cDNA were greater than those from heart, skeletal muscle, and testis, suggesting some variations in expression levels of TREX2 in human cells. The recovery of TREX1 and TREX2 transcripts from all human tissues tested implicates these proteins in housekeeping functions such as the DNA repair processes necessary to maintain the integrity of the human genome.
Identification of Novel TREX1 Transcripts-Previous mapping experiments located the human TREX1 ORF to chromosome 3p21.3-21.2 (43). However, limited genomic sequence in this region precluded a detailed analysis of the sequence surrounding the TREX1 ORF. More recently, we used the TREX1 cDNA (no. AF151105) as the query sequence in a BLAST search of the GenBank high throughout genomic sequences data base to identify the human BAC clone RP11-24C3 (no. AC021328). The presence of TREX1 on this genomic DNA fragment confirms the mapped position of TREX1 to 3p21.3-21.2 on the current Human Genome Map. The RP11-24C3 DNA clone is a "working draft" sequence consisting of 20 unordered contigs with gaps of ϳ100 bp. The correct order of the five contigs containing the TREX1 ORF was determined using human, mouse, and pig ESTs in the GenBank dbEST data base and a genomic sequence in the GenBank GSS data base (Fig.  6). A pig EST AW346748 identifies the two contigs labeled II and III immediately 5Ј to the TREX1 ORF-containing contig labeled I. The translated protein sequence of the first 66 bp of the pig EST identifies the human BAC clone AQ430752 in the GSS data base. A BLAST search of the dbEST data base using the AQ430752 clone as the query sequence identifies a mouse EST AI035823 that positions the next 5Ј adjacent contig labeled IV. Finally, a BLAST search of the dbEST data base using the EST AI035823 as a query sequence identifies several ESTs (i.e. no. AI031903) that indicate the position of the contig labeled V. Thus, five DNA fragments containing 25 kb of genomic DNA on RP11-24C3 have been ordered to reveal the genomic sequence positioned 5Ј to the TREX1 ORF. We cannot exclude the possibility that additional contigs might be intervening but were not detected in our analysis.
A PCR strategy was developed to identify novel TREX1 cDNAs with possible transcription initiation sites positioned 18 kb 5Ј to the TREX1 ORF. The potential exons contained within this 18-kb region were identified using the gene-finding algorithm GENSCAN (47). A hypothetical transcript encoding a protein of 887 amino acids containing 13 exons with the TREX1 ORF as the most 3Ј exon is predicted (Fig. 6). Transcripts with this exact sequence have not been identified. However, human ESTs containing exons 1-4 (no. BE871415), exons 2-11 (no. AK022405), and exons 12 and 13 (no. BE615019) have been identified (Fig. 6). A PCR strategy was developed to identify novel TREX1 cDNAs that contain the exon sequences encoded in the 18-kb 5Ј-flanking region (Fig. 6). First, a two-round PCR was performed using the nested TREX1 ORF-specific primers (T1 rv2 and T1 rv1 ) and the nested 5Ј-flanking region primers (T1 fr3 and T1 fr4 ) with AML cDNA to amplify cDNAs containing exons 10 -13. Two products were recovered from agarose gels and sequenced. The sequences confirm the presence of exons 10 -13 in these TREX1 cDNAs (Fig. 6, labeled 1 and 2). Furthermore, the sequences indicate the PCR products were recovered from transcripts spliced from exon 10 to exon 11 precisely as predicted using the GENSCAN program, but neither cDNA was spliced from exon 11 to exon 12. Additional PCRs were performed using the nested TREX1 ORF-specific primers (T1 rv2 and T1 rv1 ) with the nested 5Ј-flanking region primers (T1 fr5 and T1 fr6 ) to amplify cDNAs containing exons 4 -13 or with the nested 5Ј-flanking region primers (T1 fr7 and T1 fr5 ) to amplify cDNAs containing exons 1-13. Two additional TREX1 ORF-containing cDNAs were identified from these reactions (Fig. 6, labeled 3 and 4). The cDNA labeled 3 contains the GENSCAN predicted exons 4 -13 and two additional exons labeled 5A and 6AB. The cDNA labeled 4 contains the predicted exons 1-13 and four additional exons: 5A, 6A, 6B, and 6C. The TREX1 ORF-containing cDNAs identified in these PCRs span the polyadenylation signal (Fig. 6, labeled poly(A) #1) identified between exons 11 and 12. To generate these transcripts mRNA synthesis must proceed past the nonconsensus poly(A) 1 signal (AUUAAA) and through the TREX1 ORF to the consensus poly(A) 2 signal (AATAAA) (Fig. 6). These data indicate a complex pattern of transcription initiation and RNA processing for the mammalian TREX1 gene and suggest an association between the TREX1 ORF and exons identified in the 5Ј-flanking region.
A 5Ј-RACE analysis supports transcription initiation 5Ј to the GENESCAN-predicted exon 1. A two-round PCR was performed using spleen cDNA with the nested primers (T1 rv5 and T1 rv4 ) designed from exon 1 and exon 3 sequences and the cDNA adapter primers. Three independent clones were recovered, and these sequences were aligned with the TREX1 genomic sequence (data not shown). The 5Ј ends of these cDNAs are within 150 bp of the predicted initiation ATG in exon 1. Analysis of this sequence using the NNPP program identifies a potential transcription initiation site positioned 185 bp 5Ј to the predicted initiation ATG in exon 1. This transcription initiation site is Ϫ18 kb 5Ј to the TREX1 ORF (Fig. 6).
Identification of Novel TREX2 Transcripts-In a previous report it was suggested that the TREX2 ORF was part of a larger GENESCAN-predicted ORF located in a genomic clone (no. AF002998) from chromosome Xq28 (43). This hypothetical transcript encodes a protein of 840 amino acids containing 16 exons with the TREX2 ORF as the most 3Ј exon (Fig. 7). There are no ESTs that correspond precisely to this predicted transcript. However, a human EST containing several of the exons in the GENESCAN-predicted ORF (no. AF267739) has been identified (Fig. 7). A PCR strategy was designed to identify novel TREX2 cDNAs that contain exons encoded in the 5Јflanking region of the TREX2 ORF. A two-round PCR was performed using the nested TREX2 ORF-specific primers (T2 rv1 and T2 rv2 ) and the nested 5Ј-flanking region primers (T2 fr3 and T2 fr4 ) with AML cDNA to amplify cDNAs containing exons 1-16 in the 25-kb 5Ј-flanking region. Four products were recovered from agarose gels and sequenced. The resulting cDNAs confirm the presence of transcripts containing exons that span the complete 25-kb 5Ј-flanking region of TREX2 (Fig.  7, labeled 1-4). Many of the 16 GENESCAN-predicted exons and others are present in the TREX2 cDNAs. Exons 3, 6, 7, 8, and 14 predicted in the GENESCAN analysis are not present in any of the recovered cDNAs. Additional exons (Fig. 7, labeled 12A and X) are detected in the TREX2 cDNAs. Exon X contains multiple stop codons in all three reading frames that disrupt the potential continuous ORF in these TREX2 cDNAs. In TREX2 cDNA 3, an additional stop codon is present in exon 12 (Fig. 7). Generation of these TREX2 transcripts requires mRNA synthesis past the nonconsensus poly(A) 1 signal (AA-GAAA) and through the TREX2 ORF to the consensus poly(A) 2 signal (AATAAA). Identification of these novel TREX2 cDNAs supports the concept that transcription initiation in the 5Јflanking region might generate transcripts that could be processed by RNA splicing to contain a single ORF including the 236-amino acid TREX2 sequence as the most 3Ј exon.
A 5Ј-RACE analysis also supports transcription initiation 5Ј to the GENESCAN-predicted exon 1. A two-round PCR was performed using spleen cDNA with nested primers designed from exon 10 sequences (T2 rv3 and T2 rv4 ) and the cDNA adapter primers. Two clones were recovered, and these sequences were aligned with the TREX2 genomic sequence (data not shown). The 5Ј ends of these cDNAs are located near the predicted initiation ATG in exon 1. Analysis of this sequence using the NNPP program identifies a potential transcription initiation site positioned upstream from the predicted initiation ATG in exon 1 (Fig. 7). This transcription initiation site is more than 25 kb 5Ј to the TREX2 ORF. The detection of novel TREX1 and TREX2 transcripts containing a dicistronic structure indicates a complex pattern of expression for these genes. The potential relationships between the ORFs encoded in the 5Ј-flanking regions and the TREX1 and TREX2 ORFs are not apparent. Furthermore, the ability to translate the TREX ORFs within the context of the dicistronic transcript has not been tested.
In conclusion, we have demonstrated that the TREX1 and TREX2 genes encode mammalian 3Ј 3 5Ј exonucleases that are expressed in all human tissues examined. There are a number of similarities in the structures and in the expression patterns of the TREX genes. The genomic sequence encoding the 314amino acid TREX1 protein is contained in a single ORF. The 236-amino acid TREX2 protein is also encoded in a single ORF. For both TREX1 and TREX2, transcripts are initiated within 1 kb of the exonuclease ORFs, and intronic sequences are removed from the 5Ј-untranslated region by two possible RNA splicing pathways. Additional sites of transcription initiation are identified at positions 18 kb 5Ј to the TREX1 ORF and 25 kb 5Ј to the TREX2 ORF generating transcripts that contain the TREX ORF and a second upstream ORF of unknown function. The detection of TREX1 and TREX2 transcripts in all human cells indicates the ubiquitous expression of these genes and supports a requirement for these 3Ј exonucleases in DNA repair pathways in human cells.