Human Transaldolase-associated Repetitive Elements Are Transcribed by RNA Polymerase III*

Repetitive elements flanked by exons 2 and 3 of the human transaldolase gene, thus termed transaldolase-associated repetitive elements, TARE, were identified in human DNA. Nonpolyadenylated TARE transcripts were detected by Northern blot analysis and cloned by reverse transcriptase-mediated polymerase chain reaction from human T lymphocytes. A dominant 1085-nucleotide long transcript, TARE-6, contained two adjacent Alu elements, a right monomer and a complete dimer, oriented opposite to the direction of transcription of the transaldolase gene. Reverse transcriptase-polymerase chain reaction and in vitrotranscription analyses showed that transcription of TARE-6 proceeded in the orientation of the RNA pol III promoter of the Alu dimer and opposite to the orientation of the TAL-H gene. TAREs lacking RNA polymerase III promoter showed no transcriptional activity.In vitro transcription of TARE-6 was resistant to 1 μg/ml α-amanitin but sensitive to 100 μg/ml α-amanitin and tagetitoxin, suggesting involvement of RNA polymerase III. TAREs in both the transaldolase and HSAG-1 genomic loci were surrounded by TA target site duplications. Homologies between transaldolase and HSAG-1 break off internally at splice donor and acceptor sites. The results suggest RNA polymerase III-mediated transcription of TARE may be a source of repetitive elements, contributing to distinct genes and thus shaping the human genome.

Retrotransposable elements make up as much as 5% of eukaryotic DNA. They are generally considered as a major force in shaping of the genome (1,2). Retrotransposable elements can be divided into two major classes: those with long terminal repeats (LTRs) 1 and those without. The LTR class of elements replicate through RNA intermediates similar to retroviruses (2). In contrast, little is known about the mechanism of retrotransposition of the second class of retrotransposons, the non-LTR elements. Representatives of this class in the human genome include the long (6 -7 kb) (LINEs) (3) and short (90 -400 bp) interspersed nucleotide elements (SINEs) (4). These elements lack most of the genes encoded by the LTR-containing retrotransposons. The RNA intermediate involved in retroposition of LINEs is transcribed by RNA polymerase II, while that for SINEs is produced by RNA polymerase III. Both LINEs and SINEs are present in the order of 10 5 copies dispersed throughout the genome. A third family of non-LTR retroposons are retropseudogenes. They represent cDNA copies of fully processed mRNA transcripts. These retropseudogenes lack introns present in the chromosomal locus and always include a poly(A) tract at their 3Ј end. Since retropseudogenes are deprived of promoter elements located upstream from the transcription initiation site in the parental gene locus, retrotransposition of a correctly initiated mRNA will result in an inactive retropseudogene. However, on rare occasions, retrotransposition of fully processed mRNAs can lead to the creation of new functional genes such as the rat and mouse preproinsulin I gene (5) and jingwey in Drosophila (6).
Evolution of split genes has been suggested to occur by development of a splicing system that could join discontinuous gene structures to make functional proteins or, alternatively, by intron mobility, i.e. a reversal of the splicing process at the RNA level (7)(8)(9) or insertion of transposable elements into pre-existing genes (10). The exon-intron structure of genes has been very important in the generation of new genes during evolution (9). Approximately 1 of every 20 genes is expressed by alternative pathways of RNA splicing. As an example, this mechanism allows tissue or growth stage-specific production of 20 different proteins from a single fibronectin gene (11).
We have previously determined that the coding sequence of the human transaldolase (TAL-H) gene contains a transaldolase-associated repetitive element (TARE) (12). TAREs constitute a family of 1,000 to 10,000 repetitive elements encompassed by the 2nd and 3rd coding exons of TAL-H and contain internal sequences of variable length. Nonpolyadenylated TARE transcripts were detected by Northern blot analysis and cloned by reverse transcriptase-mediated polymerase chain reaction (RT-PCR) from human T lymphocytes. A dominant transcript, TARE-6, contained a typical RNA polymerase III promoter within an internal Alu dimer. TARE-6 was transcribed in vitro in the orientation of the RNA pol III promoter. Resistance to ␣-amanitin, sensitivity to tagetitoxin, and mutagenesis studies showed the dependence TARE-6 transcription on RNA polymerase III. Transcriptionally active TAREs have contributed to distinct functional genes and may be a source of mobility in the human genome.

MATERIALS AND METHODS
Southern and Northern Blot Analysis-Genomic DNA was isolated from peripheral blood lymphocytes (PBL) and cell lines, digested with restriction enzymes, electrophoresed in 0.7% agarose gel, blotted to nylon membrane, and hybridized to 32 P-labeled DNA probes under high stringency conditions as described previously (13). Total RNA was extracted by the RNAzol method (19). Poly(A) ϩ RNA was isolated by binding to poly(U) Sephadex column (Life Technologies, Inc.), fractionated in 1% glyoxal gels, and transferred to nylon membranes (13).
Screening of Genomic Library-A human lymphocyte genomic DNA library was prepared in DASH phage (Stratagene, La Jolla, CA) and screened with TAL-H cDNA fragments 4/2 and 4/1 (12) under high stringency conditions as earlier described (13).
PCR-10 ng of DNA was subjected to PCR amplification under the following conditions: denaturing at 94°C for 1 min, annealing at 40 -65°C experimentally determined for each primer pair for 1 min, primer extension at 72°C for 1 min in 30 cycles. Following amplification, 10 l of the 100-l PCR reaction volume was electrophoresed in a 2% agarose gel, transferred to a nylon membrane in 0.4 NaOH, and hybridized to 32 P-labeled probe as earlier described (14,15). RT-PCR-3 g of total RNA was reverse transcribed into cDNA by 200 units of Superscript reverse transcriptase (Life Technologies, Inc., Bethesda, MD) using 80 ng of oligonucleotide or 50 ng of random hexamers as primers for 10 min at room temperature and then for 50 min at 42°C. The reaction was terminated by heating at 90°C for 5 min. After chilling on ice, RNA template was digested with 2 units of Escherichia coli RNase H (Life Technologies, Inc.) at 37°C for 20 min. Subsequently, the first strand cDNA was subjected to PCR. RT-PCR fragments were cloned into the pCR2.1 vector (Invitrogen, San Diego, CA) and sequenced by the chain termination method (16).
Sequencing of the TAL-H Locus-Three partially overlapping DASH genomic clones were analyzed (12). To assure reliable sequence data for Alu-rich regions of the TAL-H locus, two adjacent EcoRI fragments (between nucleotides 3541-10172 and 10167-12772) were cloned in the pSP72 vector for transposon-mediated sequencing (17). pSP72based plasmids were transformed into HB101FЈ cells. Transposon insertions were created by mating HB101FЈ donor strains with the JGM recipient strain and plating out the progeny on doubly selective plates (50 g/ml ampicillin and 50 g/ml kanamycin). 28 plasmids with ␥␦ transposons spaced 300 -400 base pairs apart were selected by PCRbased mapping using primer NGDIR (5Ј-GTTCCATTGGCCCTCAAAC-3Ј) located at both ends of the transposon and PM001 (5Ј-CGTTA-GAACGCGGCTACAAT-3Ј) or PM002 (5Ј-GCCGATTCATTAATGCAG-GT-3Ј) primers flanking the multiple cloning site of pSP72. Thus, M13 forward (5Ј-TGTAAAACGACGGCCAGT-3Ј and reverse (5Ј-CAGGAAA-CAGCTATGACC-3Ј) sequencing priming sites, carried by the transposon, were introduced throughout the target sequence at 300 -400-nucleotide intervals. Sequences of both strands were determined and analyzed with the University of Wisconsin Genetics Computer Group (GCG) software (18). Alu sequences were identified with the Pythia program (19,20). The nucleotide sequences reported in this paper have been submitted to the GenBank/EMBL data base.
In Vitro Transcription-As TARE-specific templates, RT-PCR clone 2/8 in Bluescript KSϩ plasmid, gel-purified full-length TARE-6-equivalent 2/8 insert and its fragments, truncated at the TAL-H exon 3-proximal end, were utilized. The VA1 template (XbaI-BalI fragment from the adenovirus type 2 genome cloned into the XbaI/SmaI sites of pUC12) pVA1 was used as positive control for RNA polymerase III (pol III)-mediated transcription (21). The adenovirus major late promoter sequence from Ϫ400 to ϩ10 fused to a G-less cassette in the pML(C 2 AT) plasmid was used as positive control template for RNA polymerase II (pol II)-mediated transcription (22). 8 l of HeLa cell nuclear extract (Upstate Biotechnology, Lake Placid, NY) was incubated with 0.8 l of 0.5 g/l template for 10 min on ice to allow binding of transcription factors to DNA. Reaction mixtures (11.2 l) containing 10 mM HEPES (pH 7.9), 10% glycerol, 0. and vacuum dried for 10 min. Precipitates were resuspended in formamide dye mixture and electrophoresed on an 8% polyacrylamide gel containing 8 M urea, and gels were exposed to x-ray film.
Inhibitors of Transcription-␣-Amanitin, a bicyclic octapeptide from the mushroom Amanita phalloides, which selectively inhibits RNA pol II-mediated transcription at low concentrations (23), was obtained from Sigma. Tagetitoxin, an inhibitor of RNA pol III (24), was obtained from Epicentre Technologies (Madison, WI). In vivo transcription in Jurkat (ATCC number CRL8163) and Molt-4 T cell leukemia cell lines (ATCC number CRL1582) was inhibited by a 5-h incubation with 50 g/ml ␣-amanitin in RPMI 1640 medium supplemented with 2 mM L-glutamine, 100 units/ml penicillin, 100 g/ml gentamicin, and 10% fetal calf serum (Life Technologies, Inc.) as earlier described (25). Effect of ␣-amanitin on in vivo transcription was assessed by Northern blot analysis of total RNA extracted from ␣-amanitin-treated and untreated cells. Transcription by RNA polymerase I was monitored by a 28 S ribosomal probe (pES-28 S cloned into the EcoRI/SalI sites of pGEM4, kindly provided by Dr. Joan Steitz of Yale University). Inhibition of RNA pol II was evaluated by levels of transcription of human ␤-actin (26) and 4/1 segment of TAL-H cDNA (12). A 7SL RNA probe (pSP7SL cloned into EcoRI/XbaI sites of SP64, a gift from Dr. Peter Walter of the University of California at San Francisco) was utilized to monitor RNA pol III transcription.
Primer Extension-Total RNA or in vitro transcribed RNA were used as templates. Prior to primer extension, radiolabeled and non-radioactive in vitro transcription reactions were carried out and separated in a 4% polyacrylamide gel with 7 M urea. TARE transcripts were excised from the gel and passively eluted in 500 mM ammonium acetate, 10 mM MgCl 2 , 1 mM EDTA, 0.1% SDS at room temperature overnight. Then, RNA was phenol/chloroform extracted, ethanol-precipitated and used for primer extension as earlier described (27). Oligonucleotide primers 5Ј end-labeled with [␥-32 P]ATP by T4 polynucleotide kinase were annealed with 5 g of total RNA or 0.5 g of gel-purified in vitro transcribed RNA at 70°C for 10 min, chilled on ice for 2 min. RNA was resuspended in extension buffer containing 20 mM Tris-HCl (pH 8.4), 50 mM KCl, 2.5 mM MgCl 2 , 0.5 mM dNTP, preincubated at 42°C for 5 min, then 200 units of Superscript II reverse transcriptase (Life Technologies, Inc.) was added, and the sample was incubated at 42°C for 50 min. The reactions was terminated by heating at 70°C and chilling on ice. Samples were analyzed in a 6% sequencing gel. cDNA generated with primer extension was cloned by anchored PCR using the 5Ј rapid amplification of cDNA ends kit from Life Technologies, Inc. (Gaithersburg, MD). Briefly, cDNA generated with 4/2ORFc primer was treated with RNase, purified on a Glassmax spin cartridge, tailed with dCTP and terminal deoxynucleotidyl transferase (TdT), amplified with nested 4/2ORFd and deoxyinosine-containing anchor primers, and cloned into the pCR2.1 vector.

Detection and Cloning of Transaldolase-associated Repetitive
Elements, TAREs-The full-length TAL-H cDNA clone 4/2-4/1 contains 474-bp 5Ј (4/2 subclone) and 827-bp 3Ј (4/1 subclone) EcoRI fragments (Fig. 1A). Southern blot hybridizations under high stringency conditions revealed that the 5Ј segment of the TAL-H cDNA was repetitive while the 3Ј fragment appeared to be a single copy element in the human genome (Fig. 1B). The functional TAL-H gene has been mapped to a single copy genomic locus (TALDO1) on human chromosome 11 at p15.4-p15.5 (28). Based on comparative Southern blot analyses and screening of two human lymphocyte genomic DNA libraries, the copy number of TARE was estimated between 1,000 and 10,000 per haploid genome (12).

TARE Units Are Bounded by Terminal Segments Corresponding to TAL-H Exons 2 and 3-Boundaries of the repeti-
tive element within the TAL-H gene were assessed by gene amplification via PCR from genomic DNA of normal human peripheral blood lymphocytes. A panel of overlapping primer pairs spanning exons 1-8 of TAL-H cDNA were utilized. A series of TARE fragments were amplified by PCR using oligonucleotide primers spanning exons 2 and 3 of TAL-H (Fig. 2, primer set a). These fragments (designated as TAREs 1-6 corresponding to molecular weights 145, 438, 508, 691, 779, and 992 bp in Fig. 2) were cloned and sequenced (Fig. 3A). Each TARE unit was bounded by highly conserved regions corresponding to exons 2 and 3 of the TAL-H gene in an orientation matching with that of the TAL-H locus (Fig. 3B). TARE could not be amplified with 5Ј primer from TAL-H exon 1 (not shown) or 3Ј primers from exons 4 through 8 (Fig. 2, primer pairs b, c,  and d). Presence of similar TARE units 2-6 was confirmed in peripheral blood lymphocytes of four normal donors and Jurkat T-cell leukemia cells using primers derived from exons 2 and 3 ( Fig. 4). This suggested that the repetitive element was confined to exons 2 and 3 of the TAL-H gene.
All TARE units showed a Ͼ98% sequence identity in regions corresponding to exons 2 and 3 of the TAL-H gene. TARE-6 was the longest of among TARE units with related internal sequences. The most notable site of sequence variation among different TARE-6 clones was a G 3 C transition at nucleotide position 974 converting a splice acceptor site in the sense orientation into a splice donor site in the antisense orientation (Fig. 3A). While TARE-1 contained no intron, other members of the TARE family (TAREs 2-5) harbored related internal sequences successively truncated at their TAL-H exon 3-proximal end corresponding to splice sites in the antisense orientation. This may account for the varying size distribution of TAREs in genomic DNA.
Alignment of TAREs with the TAL-H Genomic Locus-In contrast to the repetitive exons 2 and 3, the functional TAL-H gene locus (TALDO1) is a single copy element (28). The TAL-H locus contains 28, an unusually high number of Alu elements ( Table I). Alignment of TARE-6 with the TAL-H locus demonstrated a considerable homology between corresponding regions flanked by exons 2 and 3 (Fig. 5A). TARE-6 clones harbor two adjacent Alu cassettes. Both of these Alus in TARE-6 are in the antisense orientation with respect to the direction of transcription of the TAL-H gene. Based on alignment with major Alu subfamilies (19,20), Alus of TARE-6 were identified as a Alu-Y right (R) monomer and an Alu-Y dimer (Fig. 3A). An Alu-Y R monomer and an Alu-Y dimer matching with those in TARE-6, were found at base positions 8920 -9085 and 11185-11460, i.e. 2100 nucleotides apart in the TAL-H locus. Interestingly, in TAL-H DNA, the Alu-Y R monomer is adjacent to an older Alu-J dimer (base positions 9099 -9376). Thus, in comparison to the TAL-H locus, TARE-6 appears internally deleted between two Alu elements. Discontinuation of homologies and possible sites of recombination between the Alu-J (9099 -8376) and Alu-Y repeats (11185-11460) are shown in Fig. 5A. These results suggest that TARE-6 may have evolved from the TAL-H locus via recombination between these Alu elements. In the anti- sense orientation, Alu cassettes of TARE-6 contain several potential splice donor sites (29) which may have been utilized in formation of TARE-3, -4, and -5 (Fig. 3, A and B). Non-Alu intronic sequences of TARE isolates 2-6 are homologous, which further supports their common origin.
HSAG-1 Locus Contains TARE with an Unrelated Internal Sequence-TARE-equivalent segments were noted in a previously described 3.4-kb element, HSAG-1, capable of inducing a leukemia-associated surface antigen (30). A sequence homology of 88% between exon 2 of the TAL-H gene and HSAG-1 (nucleotides 1173-1297) and a homology of 93% between exon 3 of the TAL-H gene and HSAG-1 (nucleotides 2560 -2668) were noted. Sequence alignments showed that homologies between TAL-H exons 2 and 3 and the two segments of similar length, 1262 nucleotides apart, in HSAG-1 broke off at internal splice sites (Fig. 5B). Moreover, homologous TARE regions in the TAL-H and HSAG-1 loci were bounded by TA direct repeats (Figs. 3B and 5B). TA is the target sequence of most prokaryotic and eukaryotic transposable elements (31)(32)(33)(34)(35)(36)(37), suggesting that these direct repeats may correspond to target site duplications flanking TARE elements in both the TAL-H and HSAG-1 loci. Unlike exons 2 and 3 of the TAL-H gene, which encode a 78-amino acid long region of the TAL-H protein (12), homologous segments in the HSAG-1 locus do not encode such protein (30,38,39). HSAG-1 was found to elicit expression of a 14 -20-kDa leukemia-associated cell-surface antigen (30). This latter protein is clearly distinct from TAL-H since mAb 37-28, specific for the HSAG-1-induced protein (30), did not react with a TAL-H cDNA-encoded recombinant protein or the native 38-kDa TAL-H protein (not shown). Furthermore, unlike HSAG-1, TAL-H protein is confined to the cytoplasm (12,40).
Cloning of TARE Transcripts by RT-PCR-Transposition of repetitive elements may be accomplished through DNA or RNA intermediates (2). In order to evaluate whether TARE is transcribed into RNA, total RNA and poly(A) ϩ RNA from Jurkat cells were analyzed by Northern blot hybridizations. As shown in Fig. 1C, the 5Ј (4/2) probe hybridized to a number of abundant 0.5-7.5-kb RNA species, whereas the 3Ј (4/1) probe annealed to a single 1.3-kb transcript in total cellular RNA. In poly(A) ϩ RNA, however, both the 5Ј and 3Ј probes annealed to a single 1.3-kb message. The 5Ј probe also hybridized to a faint larger molecular weight band that may represent a small carryover of non-polyadenylated RNA. The abundant 0.5-7.5-kb transcripts recognized by the 5Ј probe in total RNA, were absent in poly(A) ϩ RNA. These results suggested that TARE transcripts may be nonpolyadenylated.
TARE transcripts were further analyzed in total RNA by RT-PCR using a panel of TARE-6-specific oligonucleotide primers. Dominant TARE-6 RT-PCR products were found in total RNA of normal human peripheral blood lymphocytes and Jurkat T cell leukemia cells (Fig. 6). A 234-bp product, amplified from TAL-H exon 3 antisense oligonucleotide-   (Fig. 6B) or random hexamers but not with oligo(dT) or TAL-H antisense primers (not shown). Relatedness of this 992-bp antisense RT-PCR product to TARE-6 was demonstrated by nested amplification which resulted in detection of 865-and 774-bp internal fragments using TARE-6-specific internal primer sets, e and f, respectively (Fig. 6A). TARE-6-specific 1085-bp RT-PCR product was cloned from Jurkat cells (clone 2/8) and compared with genomic TARE-6 DNA clone 2/16. They showed a less than 3% sequence divergence. Both the right monomer and dimer Alu elements within clones 2/8 and 2/16 belonged to the same Alu-Y subclass (20). Interestingly, 31 of 42 sequence differences between clones 2/8 and 2/16 were clustered within a 173-nucleotide long segment (between residues 634 and 806) corresponding to the right arm of an Alu-Y dimer in clone 2/8. The same area of TARE-6 was also rearranged and disrupted by Alu-mediated recombination in the TAL-H gene locus (Figs. 3B and 5A). Preservation of a typical Alu-Y element in clone 2/8 may be important for transcription of this TARE unit. Another RT-PCR fragment from Jurkat cells, clone 1/7, representing a 484-bp primary RT-PCR product, exhibited a sequence homology of 100% with TARE-3 (Fig. 3A), except that clone 1/7 had a 25-bp truncation at the 3Ј end of the internal sequence. This truncation may reflect an additional splicing pattern of TARE in Jurkat cells. A typical RNA polymerase III internal split promoter (41)  promoter is opposite to that of the TAL-H gene which may be responsible for the antisense transcriptional orientation of TARE-6, as evidenced by the RT-PCR analysis. In other less efficiently transcribed TARE elements, the RNA pol III promoter was internally deleted. Internal deletions in TARE units 2-5 corresponded to splice sites in the antisense orientation concurrent with the direction of TARE-6 transcription, suggesting a precursor-product relationship between TARE-6 and the shorter TARE species.
TAREs Are Transcribed by RNA Polymerase III-To further assess the mechanism of TARE transcription, RNA polymerase inhibitors were utilized. Jurkat cells were pretreated with 50 g/ml ␣-amanitin, a relatively selective inhibitor of RNA polymerase II (23), for 5 h prior to extraction of RNA. Northern blots were hybridized under high stringency and washed in 0.1 ϫ SSC, 0.1% SDS at 65°C. The 3Ј probe specifically annealed to the 1.3-kb TAL-H transcript (Fig. 7A). As expected, ␣-amanitin inhibited transcription of the 1.3-kb TAL-H mRNA recognized by both the 3Ј and 5Ј probe of 4/2-4/1 cDNA. The 5Ј probe also annealed to four additional RNA species. Identity of two other ␣-amanitin-inhibited transcripts, 7 and 3.5 kb, is unknown. An abundant 5-kb RNA species cross-hybridized with a 28 S ribosomal probe, pES-28 S. Abundance of 28 S RNA, transcribed by RNA pol I, was not affected by ␣-amanitin. As a control, 7SL RNA probe was utilized to monitor relative resistance of RNA pol III transcription to inhibition by ␣-amanitin. The 5Ј (4/2) probe also hybridized to an ␣-amanitin-resistant 1-kb transcript (Fig. 7A).
Transcription of TARE-6 was further investigated by in vitro transcription in the presence of RNA polymerase inhibitors. ␣-Amanitin selectively inhibits RNA pol II at a low concentration (1 g/ml), while it also inhibits RNA pol III at a high concentration (100 g/ml) (23). Tagetitoxin is a specific inhibitor of RNA pol III (24). As shown in Fig. 7B, TARE-6 was efficiently transcribed in vitro by HeLa cell nuclear extract, used as a source of transcription factors. 1 g/ml ␣-amanitin completely abrogated RNA pol II-mediated transcription of pML(C 2 AT), while transcription of an RNA-pol III-dependent template, pVA1, and that of TARE-6, were not affected (Fig.  7B). By contrast, 100 g/ml ␣-amanitin also inhibited transcription of pVA1 and TARE-6 templates. Tagetitoxin suppressed transcription of TARE-6 and pVA1, while it had no effect on transcription of pML(C 2 AT). Bluescript KSϩ or pCR2.1 vectors alone were not transcribed by HeLa nuclear extract (not shown).
Transcriptional regulatory elements in TARE-6 were evaluated by deletion studies and site-directed mutagenesis. TARE-5, lacking both A and B boxes of the RNA pol III promoter showed no transcriptional activity (data not shown).  (Fig. 6). Site-directed mutagenesis at position 1 of the B box, previously associated with diminished promoter activity (41), did not affect transcription of TARE-6 ( Fig. 8). Transcription start site of in vitro transcribed RNA was determined by primer extension using primer 4/2 ORFc corresponding to the first 18 nucleotides of exon 3 (Fig. 9A). 4/2 REVd was used as an antisense control oligonucleotide. To determine the start site of TARE-6 transcripts in vivo, primer extension studies were carried out on total RNA from Jurkat cells. As shown in Fig. 9B, 144-and 121-nucleotide long primer extension products were identified with TAL-H sense oligonucleotides 4/2 ORFc and 4/2ORFd, respectively, using RNA transcribed in vitro from TARE-6 template or total RNA of Jurkat cells. No primer extension products were obtained with antisense oligonucleotides 4/2REVb and 4/2REVd. This analysis suggested that the start site of in vitro transcription of TARE-6 corresponded to that used in vivo. The results showed that TARE-6 transcripts were generated both in vitro and in vivo in the antisense direction with respect to the TAL-H gene. Primer extension products of in vivo transcribed RNA were cloned by anchored PCR. Sequencing of 144-and 121-bp products located the start site of TARE transcripts 35 bases downstream from exon 3 of the TAL-H locus. DISCUSSION A unique feature of the TAL-H gene is that two of its exons are encoded by a repetitive element, TARE. Each TARE unit is bounded by terminal segments corresponding to TAL-H exons 2 and 3, in an orientation matching that of the TAL-H locus. The shortest element, TARE-1, contains no intron, while other members of the TARE family (TAREs 2-6) harbor related internal sequences successively truncated at their TAL-H exon 3-proximal end. Interestingly the TAL-H locus contains an unusually high concentration of Alus, 28 within a 13,113-bp genomic segment. This is almost 10-fold higher that the average of one Alu at 3000-bp intervals (20). TARE-6 harbors two Alus, a Alu-Y right monomer and an Alu-Y dimer, the latter providing an RNA pol III promoter, critical for transcriptional activity. An Alu-Y right monomer and Alu-Y dimer matching with those in TARE-6, were found at base positions 8920 -9085 and 11185-11460, i.e. 2100 nucleotides apart in the TAL-H locus. Non-Alu intronic sequences of the TAL-H locus and TARE isolates 2-6 are also homologous, which further supports their common origin. In comparison to the TAL-H locus, TARE-6 appears internally deleted between two Alu elements. Between the two younger Alus, where homologies with TARE-6 are discontinued, the TAL-H locus contains five additional Alus, including three older Alu-J units (20). This analysis suggests that TARE-6 may have evolved from the TAL-H locus via Alu-mediated recombination and deletion.
A comparative analysis of TARE elements suggested that TAREs 2 through 5 may have spread out from a precursor which likely to correspond to the transcriptionally active TARE-6 element. TARE-6 is transcribed in a TAL-H exon 3 3 2 orientation, possibly directed from the RNA pol III promoter of an Alu-Y dimer. Transcription of TAREs 2-5, in which the RNA pol III promoter was internally deleted, was not detected. Internal deletions in TARE units 2-5 corresponded to splice sites in the antisense orientation concurrent with the direction of TARE-6 transcription, indicating a precursor-product relationship between TARE-6 and the shorter TARE species and raising the possibility that TARE-6 may be the source of TARE invasion in the human genome.
The possibility that all TAREs represent retropseudogenes is unlikely for a number of reasons. While 5Ј truncations are common in retropseudogenes (2) which could account for the absence of exon 1 in TARE, a 3Ј terminal poly(A) tract is always included in retropseudogenes. Moreover, retropseudogenes derived from fully processed mRNA lack any intron present in the parental gene (2). In fact, we found a human transaldolase pseudogene (TALDOP1) containing a polyadenylated and mutated 3Ј terminal fragment of TAL-H (28). This is in contrast with the absence of 3Ј terminal exons in TAREs. While the intron-containing TAREs 2-6 are apparently confined to primate DNA, TARE-1 was noted in all mammalian DNA tested and may correspond to an intronless ancestral gene or inactive pseudogene (data not shown). Indeed, we found an intronless pseudogene in the mouse with several point mutations, a 17base deletion, and an early termination codon, thus capable of encoding a maximum of 29 N-terminal amino acid residues (data not shown). While TARE-1 may represent this nonfunctional ancestral gene, TAREs 2-6 contain introns and may originate from TARE-6 transcripts. TARE of HSAG-1 contains an intron unrelated to those of TAL-H and TAREs 2-6. With respect to the presence of a completely different intron in TARE of HSAG-1, a derivation of TARE from either of these genomic loci is unlikely. By contrast, TARE is the likely source of TAL-H and HSAG-1 since it is bounded by TA direct repeats in both loci. The TA direct repeats are not part of TARE or a potentially ancestral transaldolase gene since the TAL-H exon 2-and 3-equivalent segments are flanked by dissimilar G(C/T)T dinucleotides in the yeast (42) and A(C/C)T dinucleotides in E. coli (43), respectively. Presence of identical dinucleotide repeats at four strategic locations is a statistically significant finding (p ϭ 0.0002). Retroposon insertion sites are surrounded by short direct repeats. This implies that the genomic DNA into which the retrotransposon inserts is broken via staggered, single-stranded cleavages and that the gaps formed as the element is joined to these ends are filled in by DNA polymerase (44). Therefore, target site duplications around exons 2 and 3 suggests that the TAL-H gene and the HSAG-1 element may have developed by insertion of TARE. A recent survey identified TA as the target sequence of most prokaryotic and early eukaryotic transposable elements (31)(32)(33)(34)(35)(36)(37). The data are suggestive of an evolutionary model in which distinct internal sequences, like the ones in TAL-H/TARE-6 and HSAG-1, may have been captured by the repetitive element, TARE-1. Presence of typical splice donor and acceptor sites at junctions between the intervening sequences and exons 2 and 3 in TAL-H and HSAG-1 suggests that the intronic sequences have been acquired by a reverse splicing (7,8). These findings can be related to a recent observation on evolution of the phosphoglycerate kinase gene in trypanosomes via intron capture (45). Thus, the present data are consistent with the introns-late hypothesis (46), in which acquisition of introns by a repetitive element, possibly via reverse splicing, may lead to development of distinct functional genes.
Mobility of retrotransposons is accomplished through RNA intermediates (1,2). With the exception of LINE-1 capable of encoding a functional reverse transcriptase (47), reverse transcription of human retrotransposable elements remains enigmatic. While we were able to express a functionally active full-length human recombinant transaldolase protein (48), this protein did not display reverse transcriptase, DNAase/integrase, or transposase activity (data not shown). The two terminal open reading frames of TARE may potentially encode proteins other than TAL-H. This possibility is supported by detecting ␣-amanitin-suppressible transcripts by TAL-H 5Ј end-specific 4/2 probe (Fig. 7A). Similar to other nonviral retrotransposons, TARE may be reverse transcribed passively (2). Alu elements are the single most abundant class of retrotransposable elements in the human genome. These elements have a dimeric structure that is comprised of two related but nonidentical Alu monomers that are homologous to an internally deleted 7SL RNA gene (49). Activity of the 7SL promoter is dependent on sequences located upstream of the transcription initiation site and 7SL-derived Alu pseudogenes lacking upstream regulatory sequences are transcriptionally inactive (50). While monomeric Alu elements may be transcribed by RNA pol III in the brain (51,52), most Alu promoters are inactive in vivo (53). Dimeric Alu elements, such as the ones embedded in the long terminal repeat of the human transposon-like element THE-1, may be transcribed as part of RNA pol II transcription units (54). Capture of Alu-containing internal sequences seems to have been critical for transcription and, potentially, for retroposition of TARE-6. In vitro transcription studies suggested that TARE-6 is transcribed from an internal promoter by RNA pol III. Transcription of TARE-6 proceeded in the orientation of the RNA pol III promoter and opposite to the orientation of the TAL-H gene. The RNA pol III promoter appears to be critical for TARE transcription since (i) transcription of TARE-6 proceeds in accordance with the orientation of the pol III promoter and (ii) deletion of the pol III promoter effectively prevents transcription of shorter TARE (2-5) elements or truncated TARE-6.
TAL, which catalyzes the transfer of a 3-carbon fragment, corresponding to dihydroxyacetone, to D-glyceraldehyde 3-phosphate, D-erythrose 4-phosphate, and a variety of other acceptor aldehydes, has a pivotal role in tissue-specific function of the pentose phosphate pathway (12). Expression and enzymatic activity of TAL is regulated in a tissue-specific (40,55,56) and developmentally specific manner (57). TAL activity has a dominant effect on susceptibility to apoptosis through control of the balance between the two branches of the pentose phosphate pathway and its overall output as measured by NADPH and GSH production (58 -60). Antisense RNAs naturally occurring in eukaryotic cells have been shown (i) to control stability of complementary sense transcripts (61), (ii) to interfere with processing, such as splicing of sense RNA (62), (iii) or encode polypeptides (63). Detection of RNA antisense to the TAL-H mRNA may be particularly interesting with respect to transcriptional regulation of transaldolase activity. In summary, TARE is a new family of transcriptionally active repetitive elements which may influence shaping and function in the human genome and specifically regulate expression of TAL-H.