A novel hybrid open reading frame formed by multiple cellular gene transductions by a plant long terminal repeat retroelement.

The discovery that vertebrate retroviruses could transduce cellular sequences was central to cancer etiology and research. Although not well documented, transduction of cellular sequences by retroelements has been suggested to modify cellular functions. The maize Bs1 transposon was the first non-vertebrate retroelement reported to have transduced a portion of a cellular gene (c-pma). We show that Bs1 has, in addition, transduced portions of at least two more maize cellular genes, namely for 1,3-beta-glucanase (c-bg) and 1,4-beta-xylan endohydrolase (c-xe). We also show that Bs1 has maintained a truncated gag domain with similarity to the magellan gypsy-like long terminal repeat retrotransposon and a region that may correspond to an env-like domain. Our findings suggest that, like oncogenic retroviruses, the three transduced gene fragments and the Bs1 gag domain encode a fusion protein that has the potential to be expressed. We suggest that transduction by retroelements may facilitate the formation of novel hybrid genes in plants.

Retroelements comprise a diverse array of mobile elements that all share a common property, the copying of RNA into DNA during a step in their life cycle. Retroelements include endogenous retroviruses and class I transposable elements (i.e. LTR 1 retrotransposons, long interspersed nuclear elements, short interspersed nuclear elements, and processed pseudogenes). Infectious retroviruses have been suggested to be highly evolved types of retroelements that acquired the ability to infect with the acquisition of an envelope (env) gene (1,2). Alternatively, LTR retrotransposons may have evolved from an ancestral retrovirus through the loss of an env gene and hence the loss of infectivity (3). copia-like LTR retrotransposons have recently been reported to be horizontally transmitted (4), and some copia-like (5) and gypsy-like (6 -8) elements have been found to contain an env-like gene. Retroviruses and LTR retrotransposons thus share conserved structural, functional, and mechanistic features (9). The structural and sequence similarities between gypsy-like LTR retrotransposons and retroviruses led to the postulation that they are related evolutionarily (2,10).
Retroviruses have the potential to capture cellular genes, a process commonly known as cellular gene transduction. Cellular gene transduction by retroviruses has been fundamental in studying neoplastic transformation in animals (11). Cellular gene-containing retroviruses were found to induce tumors in animals and transform cells in culture. This was central to the discovery of cellular genes with oncogenic potential (protooncogenes) (11). Cellular gene transduction is not limited to retroviruses however. For example, the maize Bs1 retroelement was reported to have acquired a segment of a plasma membrane proton ATPase gene (pma) (12)(13)(14). This raised the question as to the significance of transduction for genomes in the absence of neoplastic transformation.
The impact of retrotransposons on genome organization and function has been suggested (15,16). L1 and Alu sequences were found in many known proteins (16), and L1-mediated exon shuffling (by transduction) was shown to occur in an experimental system with cultured human cells and suggested to represent a general mechanism for the evolution of new genes (17). The characterization of the human genome then revealed that transduction of 3Ј-flanking sequences is a common feature of L1 retrotransposition (18). Although the human PMCH1 gene is believed to have evolved by exon shuffling through retrotransposition of an antisense transcript of the MCH gene (19), the evolution of a new gene by transduction of genomic sequences adjacent to an LTR retrotransposon has not been documented.
Bs1 was identified as an insertion that inactivated the maize Adh1-S allele (20,21). Subsequent cloning and characterization revealed that Bs1 is 3203 bp in length, has 302-bp identical LTRs that are terminated by the retroviral consensus terminal sequences TG . . . CA, and is immediately flanked by a 5-bp target site duplication (22,23). Other features that are common to retroelements include a canonical primer binding site that immediately follows the 5Ј-LTR and shares similarity with plant initiator methionyl-tRNA and a polypurine tract that precedes the 3Ј-LTR. Although the internal sequence of Bs1 potentially encodes two overlapping open reading frames (ORF1 and ORF2), convincing similarity to typical retroelement-encoded peptides has not been demonstrated (22). The finding that Bs1 has transduced a portion of a cellular gene (pma) suggested either that, like retroviruses, an LTR retrotransposon can transduce cellular genes or that Bs1 is actually a defective retrovirus (12,13).
We report that the maize Bs1 retroelement has transduced segments from at least three different cellular genes. We show that most of the Bs1 internal sequence was replaced by the cellular gene sequences, which explains the lack of similarity to known retroelement proteins. Bs1, thus, remains the only clearly documented transduction event outside of the vertebrate retroviruses. Furthermore, Bs1 seems to have retained a truncated gag domain with similarity to the gypsy-like magellan LTR retrotransposon and a region that may correspond to an env-like gene. The transduction of multiple cellular gene segments by Bs1 seems to have generated a hybrid ORF that has the potential to be expressed. We discuss that this may be a general mechanism for the evolution of new genes.

EXPERIMENTAL PROCEDURES
Data Base Searches and Sequence Analysis-The entire Bs1 nucleotide sequence (GI 168648) and putative ORFs, ORF1 (GI 806301), ORF2 (GI 806303), and a short internal ORF (ORF3, GI 806302) that is spanned by ORF1, were used as queries in BLAST searches against GenBank TM (24) (version 2.0, www.ncbi.nlm.nih.gov/blast/). Exon and coding sequence predictions of c-bg and c-xe were performed using GENSCAN (genes.mit.edu/GENSCAN.html) developed at the Massachusetts Institute of Technology. Nucleotide sequence was translated into amino acid sequence using a modified version of the GDE translate program (bimas.dcrt.nih.gov/molbio/translate/) developed at the BioInformatics and Molecular Analysis Section of the National Institutes of Health. Sequence alignments were generated using the PILEUP program as part of the University of Wisconsin Genetics Computer Group suite of programs (version 10.0) and were further manipulated using GENEDOC (25).
Cloning of the Maize Cellular Genes-Zea mays ssp. mays CV W22 germplasm was obtained from Susan Wessler (University of Georgia). Seedlings were grown under greenhouse conditions for 3 weeks, and genomic DNA was extracted as described previously (26).
To isolate the maize cellular genes for 1,3-␤-glucanase (c-bg) and 1,4-␤-xylan endohydrolase (c-xe), we made use of homologous sequences available in the data bases. For c-bg, the Bs1 sequence (GI 168648, position 917-1270) was conceptually translated and used as a query in BLASTP searches. Similar proteins were used in multiple alignments, and degenerate oligonucleotides were then designed following the codon usage for Z. mays (www.kazusa.or.jp/codon/). These primers were then used in polymerase chain reactions. c-bg was amplified using a sense primer (BG21, 5Ј-GGTGAAGCT(G/C)TTCGAGGC(C/G)G-3Ј) that starts at codon number 21 of a hypothetical Arabidopsis BG protein (GI 5042412) and an antisense primer (BG1A, 5Ј-GGATCTGTATGGT-GAAGTTGC-3Ј) that corresponds to the 3Ј end of r-bg (GI 168648, position 1156 -1176). Compared with the barley xe (GI 1813594), r-xe corresponds to a part of the 5Ј-untranslated region and a part of the amino terminus of the coding region. Thus, we chose a conserved region that spans the active site of the enzyme (27) and that is located approximately in the middle of the protein to design the downstream primer (XEIV-1A, 5Ј-CATCTCGTTGTT(G/C/A)ACGTCGTAGTG-3Ј). For the upstream primer, we aligned the amino acid sequences of barley (GI 1718238) (27), wheat (GI 5306060) (28), and two putative Arabidopsis XE proteins (GI 12321045 and GI 8979937) with the region corresponding to the predicted protein of r-xe (GI 168648, position 1463-1664) and designed primers that span the initiation codon (RXE1478S, 5Ј-CA(A/ T)GGGCG(T/C)GTTCCG(G/C)C-3Ј). All polymerase chain reactions conditions and cloning of amplification products were as described previously (29). Sequencing was performed using a SequiTherm EX-CEL II DNA sequencing kit (Epicentre Technologies; Madison, WI) with a Li-Cor LongReader IR 4200 DNA sequencer (Li-Cor, Lincoln, NE).

Bs1
Sequence Re-analysis-Previous characterization of Bs1 established lack of similarity to any known retroelement sequences (22). However, the wealth of new sequence information currently available prompted us to re-examine Bs1. We used the Bs1 nucleotide sequence (GI 168648) and previously identified ORFs (GI 806301-3) in extensive sequence similarity searches of the GenBank TM data base (May, 2001) (24). This revealed that Bs1 contains several domains, each of which shares similarity to multiple entries in GenBank TM . The results are represented schematically in Fig. 1.
The region immediately downstream from the Bs1 5Ј-LTR (position 303-890) corresponds to a truncated gag domain. TBLASTN searches using the Bs1 ORF1 as a query revealed similarity with the maize gypsy-like LTR retrotransposon magellan ( Fig. 2) (GI 2343274; 30% identity and 58% similarity). BLASTX searches using the Bs1 sequence as a query also reveals significant similarity (35% identity and 46% similarity) to the GAG protein of a Bombyx mori non-LTR retrotransposon (GI 4521268). Interestingly, the similarity of the Bs1 GAG domain extends to several viral capsid proteins (data not shown). For example, when the amino-terminal-most 180 residues of ORF1 were used in Position-specific Iterated BLAST searches (30), sequence similarity with turnip crinkle virus coat protein (GI 335196; 24% identity and 40% similarity) was observed. The Bs1 GAG region also shares similarity (29% identity and 46% similarity) with the 55-kDa protein of human adenovirus type 41 (GI 209892). In addition, the Bs1 GAG domain shares amino acid sequence similarity with many animal tropomyosins including the Xenopus non-muscle tropomyosin (GI 530992; 44% identity and 60% similarity). The significance of this result is unclear.
BLASTX searches also revealed similarity between the Bs1 sequence (position 917-1270; frame ϩ2) and several basic 1,3-␤-glucanases (bg) (41-47% identity; 44 -60% similarity). This observation suggests that this region corresponds to a portion of a maize 1,3-␤-glucanase cellular gene (c-bg) that has been transduced by Bs1. The region corresponding to bg within Bs1 is referred to as retroelement bg (r-bg) (12). r-bg corresponds to the carboxyl-terminal amino acid residues 336 -419 of a hypothetical Arabidopsis BG protein (GI 5042412) (Fig. 3). Although genes encoding 1,3-␤-glucanases have been cloned from a wide range of plant and fungal species (including a maize cDNA that encodes acidic 1,3-␤-glucanase (31) but shares no similarity to r-bg), a maize c-bg that contributed r-bg has not been characterized. In addition, several expressed sequence tags (ESTs) from both monocots (e.g. maize, rice, sorghum, and barley) and dicots (e.g. Arabidopsis and tomato) corresponding to 1,3-␤-glucanases share no significant sequence similarity to r-bg. However, two sorghum ESTs (GI 6675471 and 9303089) were found to overlap and, when assembled, reconstitute a cDNA with 83% nucleotide sequence identity to r-bg.
In addition, BLASTX searches uncovered similarity between Bs1 (position 1310 -1664; frame ϩ3) and 1,4-␤-xylan endohydrolase (xe) from barley, wheat, Arabidopsis, and many fungi and bacteria. Relative to the barley xe isoenzyme X-II (GI 1718238), the transduced region of xe (r-xe) corresponds to amino acid residues 1-35 fused to residues 86 -118, which suggested that r-xe has sustained a deletion of sequences coding for 51 amino acids (Fig. 3). As in the case of bg, a maize cellular gene (c-xe) that contributed r-xe was not found in our searches.
Bs1 (position 1865-2521) shares similarity to a maize plasma membrane proton ATPase. A detailed description of r-pma and the identification of the maize c-pma has been reported previously (12)(13)(14).
In retroviruses and LTR retrotransposons containing envlike genes, the env is always located at the 3Ј-most region of the internal sequence, i.e. downstream from pol and just upstream from the 3Ј-LTR. BLASTX (position 2541-2853, frame ϩ2) searches revealed similarity between the 3Ј-most region of the Bs1 internal sequence and members of a very large gene family that constitutes ϳ1% of the Arabidopsis genome (Ͼ200 members) (32,33). The first gene of this family to be characterized encodes a membrane-associated salt-inducible protein (GI 473873) (34). Proteins encoded by this family do not share significant overall amino acid sequence similarity. However, they do share the presence of a degenerate 35-amino acid repeat called the "PPR repeat" (33). Careful examination of the Bs1 sequence revealed that the similarity with the PPR proteins is restricted to two tandem repeats of the PPR motif (Fig. 4).
Data base searches using the Bs1 nucleotide sequence as a query revealed a maize EST (GI 7238199) with similarity to the Bs1 LTRs. The EST is 595 bp in length and was isolated from a Z. mays CV Ohio43 cDNA library of mixed stages of anther and pollen. The 5Ј-most 366 bp of this EST is nearly identical to the Bs1 sequence (99.5%; position 2834 -3199). Specifically, the region of similarity begins 68 bp upstream of the start of the 3Ј-LTR to 4 bp upstream of the end of the 3Ј-LTR (Fig. 5). The corresponding region within the EST is followed by a poly(A) tract (18 bp Maize c-bg and c-xe-To establish direct evidence that the r-bg and r-xe domains of Bs1 correspond to transduced portions of maize cellular genes, we cloned the maize c-bg and c-xe. Nucleotide sequence analysis suggests that c-bg contains two introns and three exons (Fig. 6). Sequence alignment of r-bg and c-bg (Fig. 6) indicates that they share ϳ81% sequence identity and that r-bg corresponds to a portion of the third exon of c-bg (position 1091-1350). Because r-bg shows slightly higher identity with the sorghum ESTs (83%, GI 6675471 and 9303089), it is possible that a different maize cellular gene contributed to r-bg. Regardless, the high similarity between the Bs1 sequence and both of the sorghum ESTs and c-bg confirms that r-bg corresponds to a transduced bg gene. Maize c-xe, on the other hand, appears to contain one intron and two exons. Compared with c-xe, r-xe corresponds to a part of the first exon (108 bp downstream from the ATG) fused to a part of the second exon (position 495-588 relative to the ATG) (Fig. 7). This suggests that r-xe has sustained a deletion that removed 44 bp from the first exon, all of the intervening intron, and 82 bp of the second exon (Fig. 7). This deletion thus eliminated 42 amino acid residues in r-xe compared with c-xe (Fig. 3). Overall, r-xe and c-xe share ϳ86% nucleotide sequence identity.
Relative to their cellular counterparts, the mutation patterns of r-bg and r-xe are different. Whereas r-xe has sustained a large deletion of 385 bp, r-bg has only a single nucleotide insertion. Base substitution patterns are also different, whereas r-xe has sustained 34 nucleotide substitutions ( The alignment shows that although r-xe integration into ORF1 maintained the XE frame, r-bg integration did not maintain the BG frame. r-bg and r-xe nucleotide sequences were translated in the frames that encode ␤-1,3-glucanase and ␤-1,4-xylan endohydrolase, respectively. The sorghum EST (SoEST, an assembly of GI 6675471 and 9303089) and a hypothetical Arabidopsis protein (Atbg, GI 5042412) coding for ␤-1,3 glucanase are shown. HvX-II is the barley xylan endohydrolase isoform X-II (GI 1718238). Amino acid positions in ORF1 are numbered. in r-xe have resulted in 19 amino acid changes (of the 67 amino acid residues contributed to ORF1 by r-xe), 9 of these mutations are changes into residues conserved in the barley XE protein (Fig. 3). Overall, integration of the c-xe sequence into Bs1 and subsequent mutations in r-xe seem to have occurred in a nonrandom fashion and conserved the XE frame within Bs1 ORF1 (Fig. 3). On the other hand, r-bg integration into Bs1 did not maintain the BG frame within ORF1 (Fig. 3). Furthermore, compared with c-bg, base substitutions have resulted in 43 amino acid changes (of 85), two of them being stop codons (Fig. 3).

DISCUSSION
Bs1 gag and env Domains-Our results reveal significant similarity between the amino-terminal-most 100 residues of ORF1 and GAG of the maize gypsy-like LTR retrotransposon magellan (Fig. 2), GAG of a non-LTR retrotransposon from B. mori (35), and the capsid protein of turnip crinkle virus (36). magellan was isolated as a recent insertion in the wx-M allele (37) and as an independent insertion in a maize gene that encodes for the pl transcriptional factor (38) (GI 2343274). The recent insertion of magellan into wx and pl suggests it retains the coding capacity for all proteins necessary for retrotransposition, including GAG. magellan was established as a member of the gypsy/Ty3 LTR retrotransposons by a comparison of its integrase and reverse transcriptase domains with those of the yeast Ty3 and Lilium del gypsy-type LTR retrotransposons (37). Taken together, this suggests that a functional copy of Bs1 may belong to the gypsy/Ty3 class of LTR retrotransposons. The similarity between Bs1 and magellan GAG proteins is intriguing because homology between GAG proteins of different retrotransposons is usually low or not significant (39). This may indicate that Bs1 and magellan are, in fact, closely related.
Retroviruses and some LTR retrotransposons have an env domain located in the 3Ј region of their internal sequence (5-9). env domains are highly variable and may have originated from independent transduction events (1). Examination of the deduced amino acid sequence of the Bs1 3Ј-most region reveals similarity to a large family of proteins that contain tandem PPR repeats. Given its location and shared structural similarities with the env domains of retroviruses and LTR retrotransposons (8), we suggest that this region may correspond to an env-like domain. For example, ENV proteins are typically plasma membrane-associated glycoproteins. The putative Bs1 ENV-like protein, as well as PPR proteins in general, contains several potential N-and O-glycosylation sites, have hydrophobicity profiles consistent with membrane-associated proteins, and at least some PPR proteins are targeted to the plasma membrane (34). 2 In addition, like ENV proteins of some LTR retrotransposon (e.g. SIRE-1) (5), the PPR motif is predicted to form transmembrane domains with ␣-helices and coiled coils (33). Furthermore, members of some retrovirus groups exhibit considerable diversity in receptor usage. This is because of the presence of stretches of variable sequence within otherwise conserved sequence (40 -43). Variation has been suggested to result from both amino acid changes within and the recombination of these regions (41,42). Variation in the sequence, number, and organization of the PPR repeats has likewise been suggested to reflect an ongoing mechanism generating diversity (32).
Maize c-bg and c-xe-Our finding that Bs1 may have transduced portions of genes that encode for 1,3-␤-glucanase and 1,4-␤-xylan endohydrolase prompted us to clone these genes 2 N. Elrouby and T. E. Bureau, unpublished data. from maize. ␤-Glucanases are enzymes that hydrolyze ␤-linked glucans and are implicated in various physiological processes that involve the structure and function of plant cell walls. Plant ␤-glucanases fall into two types, 1,3-␤-glucanases and 1,3;1,4-␤-glucanases. Based on their isoelectric properties 1,3-␤-glucanases may be further divided into two classes, namely acidic and basic. Although genes for both classes have been isolated from different plant species, only one maize cDNA encoding an acidic 1,3-␤-glucanase has been identified (31). A comparison of the deduced translation product of c-bg with plant ␤-glucanases reveals that it is clearly a basic 1,3-␤-glucanase ( Fig. 3; data not shown).
Genomic and cDNA sequences encoding 1,4-␤-xylan endohydrolase have been isolated from barley and wheat (27,28,44). The barley cDNAs (GI 1718235 and GI 1718237) encode for two isoenzymes (X-I and X-II) that share 87% identity on the amino acids level (27). Both the barley and wheat genomic sequences correspond to isoenzyme X-I (28,44). The maize c-xe is most related to the barley isoenzyme X-II (Fig. 3). Barley and wheat 1,4-␤-xylan endohydrolases are involved in endosperm cell wall degradation through the hydrolysis of ␤-linked xylans (45).
Multiple Cellular Gene Transduction by Bs1-The capture of cellular genes by retroviruses is a consequence of the different features of the retroviral life cycle. As a result of inefficient transcript termination and polyadenylation at the 3Ј-LTR, ϳ15% of proviral transcripts are readthrough, spanning genomic sequences located downstream from the provirus (46,47). Joining the viral and genomic sequences is achieved at the RNA level through abnormal splicing between a donor site in the viral sequence and an acceptor site in the genomic sequence, or at the DNA level by a deletion event(s) followed by transcription from the 5Ј-LTR (48). A chimeric RNA molecule containing the cellular gene sequences and a normal viral RNA molecule are co-packaged into one viral particle (49). Strand switching by reverse transcriptase during reverse transcription may then result in non-homologous recombination events leading to the incorporation of the cellular gene sequences into the retroviral genome (47,50).
Bs1 represents the only clearly documented transduction event other than by vertebrate retroviruses and human L1 elements. We provide evidence that Bs1 has captured segments from at least three different cellular genes. Presumably, a chimeric RNA molecule must have been packaged together with a normal (i.e. functional) Bs1 RNA molecule into a viruslike particle. Evidence that LTR retrotransposons assemble into virus-like particles that are structurally and functionally analogous to retroviral cores has been clearly demonstrated (51)(52)(53). That a functional copy of Bs1 may exist in maize has also been proposed previously (13); this is suggested by the fact that Bs1 was first isolated as a recent insertion in the maize Adh1 gene (21) and indicated in our Southern hybridization results (see the Supplemental Material). However, lacking the sequence of a functional Bs1 element, we still cannot distinguish whether Bs1 is an LTR retrotransposon (perhaps gypsylike) or a bona fide retrovirus.
Regardless of the true identity of Bs1, ORF1 is reminiscent of many oncogenes in several respects (11). First, like transduced cellular proto-oncogenes (oncogenes), the Bs1-transduced genes lack their native promoters and, if transcribed, would presumably be under the control of the Bs1 promoter located within its 5Ј-LTR. Second, like viral oncogenes, the Bs1-transduced genes lack introns and have sustained deletions. For example, r-bg corresponds only to a portion of the c-bg third exon, and the 5Ј end (first and second exons and the intervening introns) and 3Ј end of the gene were deleted (Fig. 6). Relative to c-xe, r-xe is composed of a portion of the first exon fused in-frame to a portion of the second exon, whereas the intervening intron and parts of the first and second exons are deleted (Fig. 7). The deletion is flanked by a 4-bp direct repeat (CACC). Direct repeats have been implicated in the formation of simple deletions during reverse transcription (54). Furthermore, r-pma corresponds to the first codon of exon 4 through the first 72 bp of exon 10 with all intervening introns spliced out and an internal deletion of 183 bp that removed most of exon 6 (12,13). Third, the transduction of cellular genes by Bs1 may result in a fusion protein with the Bs1 GAG sequence. Most viral oncogenes encode GAG fusion proteins that differ significantly in structure and, potentially, in function from the corresponding proto-oncogenes. Fourth, relative to their cellular gene counterparts, the Bs1-transduced genes have accumulated multiple point mutations, a consequence of reverse transcriptase being error-prone.
A novel aspect of Bs1 is that it has captured more than two cellular gene segments. Possible mechanistic scenarios include the following: (i) the additive acquisition of different gene segments in multiple successive events by one Bs1 element, (ii) separate transduction events by different Bs1 elements and subsequent recombination, or (iii) transduction involving the generation of a very long readthrough transcript spanning all three genes arranged in tandem. The second scenario is unlikely because Southern hybridization experiments do not reveal intermediate Bs1 molecules with only one or two of the transduced genes. 2 The third scenario also seems unlikely because readthrough of several transcript termination and polyadenylation signals would have to occur and because our polymerase chain reactions results do not suggest that the three genes are in close physical association (data not shown). Several acutely transforming retroviruses have been shown to harbor sequences from two cellular genes (55)(56)(57)(58). In these cases, the chimeric viral genome arose by independent recombination of the viral sequence with two distinct cellular loci (57). In Bs1, sequence divergence of the transduced segments from their cellular counterparts (81, 86, and 88% for r-bg/c-bg, r-xe/c-xe, and r-pma/c-pma, respectively) suggests that transduction probably started by r-bg, followed by r-xe and then r-pma. Hence, integration of the transduced sequences may have taken place at the 3Ј-most region of the internal sequence with subsequent transductions displacing previous ones upstream. Additionally, because the distance between r-bg and r-xe on the Bs1 sequence is relatively short (Fig. 3), it is possible that they may actually correspond to the transduced portion of a composite/chimeric gene that bears similarity to both glucanases and xylan endohydrolases. However, we could not find any examples of such a gene in GenBank TM , and more importantly, we could not amplify gene sequences that span the two domains. The transducing efficiency of a retroelement may be regarded as a function of the efficiency of its polyadenylation and transcript termination. In the case of Bs1, the polyadenylation signal is non-canonical (5Ј-AATACA-3Ј) and may result in higher transducing efficiency. Regardless of the scenario, the transduction events must have occurred within very close evolutionary time windows because the divergence between the transduced gene segments and their cellular counterparts is similar for the three genes.
We identified an EST (GI 7238199) that shares identity over its 5Ј two-thirds with the Bs1 3Ј-LTR and a part of the internal sequence. The 3Ј one-third of this EST shares identity to another EST (GI 5005895) that encodes aminotransferase. The two regions of the EST are separated by a poly(A) tract and a sequence that may correspond to the XhoI restriction site used for cloning (Fig. 5). Although it is possible that this EST represents a readthrough transcript of Bs1 into a 3Ј-flanking gene (i.e. aminotransferase), it is likely that the generation of this chimeric structure is merely a cloning artifact. Regardless, this EST represents evidence that Bs1 is expressed. Interestingly, this EST was isolated from a cDNA library of mixed stages of anther and pollen, indicating that Bs1 is expressed in the germ line. Together with the fact that maize is an outcrosser, germ line expression may have facilitated the fixation of Bs1 transduction and insertion events.
ORF1 Is a Novel Hybrid-The Bs1 ORF1 has been both predicted and shown to be translated in vitro (22,23). In vitro translation experiments also indicate that a longer polypeptide predicted for the frameshift fusion of ORF1 and ORF2 (which would span the region with similarity to the PPR-containing proteins) can be generated (22). This is striking because incorporation of r-bg, r-xe, and r-pma into Bs1 involved a complex pattern of mutations including abnormal splicing of transcripts, deletions, and numerous point mutations. A possible explanation is that there may be selective pressure to maintain this ORF. Unlike oncogenes, Bs1 does not seem to be associated with any disease phenotype because it is present in normal "wild type" maize cultivars and also in wild relatives of maize, the teosintes (20). The fact that selective constraints may have maintained ORF1, that several developmentally important cellular genes have been integrated, and that r-pma retains the ATPase domain of c-pma (12,13), suggest a possible function for Bs1 in normal maize development. In humans, modification of cell function by L1-mediated transduction has been proposed (18). L1 retrotransposition has also been involved in the mobilization of cellular sequences such as exons or promoters into other locations in the genome and hence was suggested to represent a general mechanism for the evolution of new genes by exon shuffling (17,19). In light of our findings, it is tempting to speculate that transduction events associated with Bs1 gave rise to a novel hybrid ORF and, therefore, a new gene.