Cloning and characterization of human eIF4E genes.

Two human eukaryotic initiation factor 4E (eIF4E) genes were isolated and characterized from placental and chromosome 4-specific genomic libraries. One of the genes (EIF4E1) contained six introns, but the other gene (EIF4E2) was intronless, flanked by Alu sequences and 14-base pair (bp) direct repeats, and terminated by a short poly(A) stretch, all characteristics of retrotransposons. Numerous additional intronless eIF4E pseudogenes were found, but unlike EIF4E2, all contained premature in-frame stop codons. The entire EIF4E1 gene spanned >50 kilobase pairs. The coding regions of these two genes differed in four nucleotide residues, resulting in two amino acid differences in the predicted proteins. The promoter of EIF4E1 has been characterized previously. The putative promoter of EIF4E2 contained no TATA box but did contain a transcription initiator region (Inr) and numerous other sequence motifs characteristic of regulated promoters. EIF4E2 contained only two of the three polyadenylation signals present in EIF4E1. Evidence for transcription of both genes was obtained from primer extension, S1 mapping, ribonuclease protection, and reverse transcriptase-polymerase chain reaction experiments. Transcription was found to initiate 19 bp upstream of the translational initiation codon in the case of EIF4E1 and 80 bp in the case of EIF4E2. The two genes were differentially expressed in four human cell lines, Wish, Chang, K562, and HeLa.

The best understood mechanisms for the regulation of protein synthesis involve modifications in the levels or activities of the initiation factors (1)(2). Changes involving eukaryotic initiation factor (eIF) 1 2 and associated proteins affect binding of the initiator tRNA to the 40 S ribosomal subunit and occur in response to heat shock, virus infection, deprivation of nutrients, and other conditions. Changes in the eIF4 factors affect binding of mRNA to the 43 S initiation complex and occur in response to mitogens, fertilization, and other conditions. The eIF4 factors consist of the ATP-dependent RNA helicase eIF4A (3,4), the RNA-binding protein eIF4B (5,6), the cap-binding protein eIF4E (7), and eIF4G, which has specific binding sites for eIF3, eIF4A, eIF4E, and the poly(A)-binding protein (8 -10).
Mammalian eIF4E is a 25-kDa protein of known three-dimensional structure (11) which binds to the mRNA cap (7), to eIF4G (3,4), to eIF4A (12), and to the eIF4E-binding proteins (13). Its ability to function in protein synthesis is regulated by at least three processes. First, the phosphorylation of eIF4E correlates positively with the rate of translation in a large number of systems (1) and increases the protein's affinity for cap analogues by 3-to 4-fold (14). Second, eIF4E availability is regulated by eIF4E-binding proteins, the phosphorylation of which, in response to insulin and other mitogens, releases them from eIF4E and permits eIF4E binding to eIF4G (13). Third, eIF4E levels are regulated at the transcriptional level. eIF4E mRNA is increased by overexpression of c-Myc as well as transformation of cells by v-Src and v-Abl (15). eIF4E mRNA levels are also elevated in a variety of cells that have been oncogenically transformed by in vivo transfection, viral infection, or chemical mutagenesis (16).
Changes in the intracellular levels of eIF4E have a profound effect on cellular growth control. Ectopic overexpression of eIF4E leads to accelerated cell growth (17), transformation in culture and tumorigenesis in nude mice (18), prevention of apoptosis in growth factor-restricted fibroblasts (19), and elevated intracellular levels of growth-regulated proteins such as cyclin D1, c-Myc, ornithine decarboxylase, ornithine aminotransferase, P23, vascular endothelial growth factor, and fibroblast growth factor (20 -26). Reduction in intracellular eIF4E levels by expression of antisense RNA (27) results in phenotypic reversal of ras-transformed fibroblasts (28 -29). Naturally occurring breast (30 -31) and head-and-neck (32) tumors express elevated levels of eIF4E. The eIF4E gene is increasingly being referred to as a proto-oncogene (30 -34).
Previous studies have resulted in the cloning and sequencing of human eIF4E cDNA (35) and a 1.4-kb fragment from the 5Ј-end of the eIF4E gene (36). To shed more light on the expression of eIF4E, especially at the transcriptional level, we have determined the entire gene structure for human eIF4E. Surprisingly two genes were found, one containing introns and the other not. Furthermore, both genes appear to be expressed in human cells, at least at the transcriptional level.  1 The abbreviations used are: eIF, eukaryotic initiation factor; PCR, polymerase chain reaction; kb, kilobase pair; bp, base pair; nt, nucleotide; RT-PCR, reverse transcriptase-polymerase chain reaction; PIPES, 1,4-piperazinediethanesulfonic acid; Inr, initiator region. cells from conjunctiva, K562 cells from chronic myelogenous leukemia, and HeLa cells from epithelioid carcinoma of the cervix were purchased from the ATCC. All cell lines were grown in Dulbecco's modified Eagle's medium with 10% calf serum in 5% CO 2 at 37°C and harvested after 2-3 days.

Materials-Two
Screening of Human Genomic DNA Libraries-Recombinant phage were propagated in the host bacterial strains NM538 and LE392 using standard protocols (37). Plaques were screened by colony hybridization and/or PCR using primers that spanned intron/exon junctions. For screening by colony hybridization, a plasmid (pTCEEC) containing human eIF4E cDNA (38) and PCR fragments derived from it were labeled with [␣-32 P]dCTP by nick translation. Prehybridization was performed in 50% formamide at 42°C for 2 h, radiolabeled probes were added, and hybridization was carried out for Ն16 h. Membranes were washed twice in 2 ϫ SSC, 0.2% SDS at room temperature for 30 min each and once in 0.1 ϫ SSC, 0.2% SDS at 65°C for 30 min, where 1 ϫ SSC is 0.15 M sodium chloride and 15 mM sodium citrate. The filters were exposed to Kodak x-ray film or analyzed with a PhosphorImager (Molecular Dynamics, Sunnyvale, CA). For screening by PCR, the library phage lysates were divided into 50 pools (10 6 plaque-forming units/each), and 1 l of phage lysate from each pool was used as a PCR template. The positive pools were plated and allowed to grow 8 h, and the agar was again subdivided into 50 pools and screened by PCR. This process was performed a total of three to four times until a single positive plaque was obtained. DNA from selected recombinant phage was prepared from bacterial lysates (37). DNA inserts were mapped with restriction enzymes, and those containing exons were identified by Southern blotting. Exon-containing restriction fragments were excised from agarose gels, subcloned into pBluescript II (Ϫ) (Stratagene), and sequenced to identify exon/intron junctions.
Screening of the EMBL3-SP6/T7 library using radiolabeled human eIF4E cDNA yielded positive plaques corresponding to an intronless form of the eIF4E gene, here named EIF4E2. Screening of the same library with a hybridization probe synthesized with primers 9 and 10 (Table I) yielded the recombinant phage 4E1-A. Two human chromosome-specific genomic libraries in Charon 21 were screened using radiolabeled human eIF4E cDNA, one from chromosome 4 and one from chromosome 20. No positive plaques were obtained from the chromosome 20-specific library, but a number of positive plaques were obtained from the chromosome 4-specific genomic library and were termed 4E1-B, 4E1-C, and 4E1-D. A second chromosome 4-specific Charon 21 library was screened with cDNA probes and yielded 4E1-E. The LAO4NLO1 library was screened by PCR as described above using primers 3 and 7 and yielded 4E1-F. The same strategy was applied to this library but with primers 11 and 12, leading to the isolation of 4E1-G, and to the FIX II library with primers 1 and 8 leading to 4E1-H.
Southern Blot Analysis-DNA from purified positive plaques (1 g) was separated on agarose gels and transferred to nitrocellulose (37). DNA probes were labeled with 32 P by nick translation. Prehybridization and hybridization were performed as described above for colony hybridization.
DNA Sequence Analysis-DNA fragments were subcloned into pBluescript II vectors, and sequences were determined by dideoxy chain termination (39) using SK and KS primers (Stratagene) as well as exon-specific oligonucleotide primers. For both EIF4E1 and EIF4E2, the numbering system is based on human eIF4E cDNA (35), i.e. the location of the first ATG in the coding region is designated ϩ1, with upstream nucleotides having negative numbers and downstream nucleotides positive numbers. Nucleotides in introns are not numbered.
RNA Isolation-Total RNA was prepared from either human placental tissue or human cell lines by the guanidine thiocyanate method (40). In the latter case, ϳ10 7 cells were washed with phosphate-buffered saline before homogenization.
Primer Extension Analysis-Oligonucleotide primers complementary to eIF4E mRNA were end-labeled with T4 polynucleotide kinase (Promega) and [␥-32 P]ATP (37). Primer (300,000 cpm) and total RNA (ϳ100 g) were heated at 90°C for 5 min in 30 l of Hybridization Buffer (80% formamide, 40 mM PIPES, pH 6.4, 400 mM NaCl, and 1 mM EDTA) and then incubated at 30°C for 16 h. Hybrids were precipitated in ethanol and dissolved in 30 l of 50 mM Tris-HCl, pH 8.3, 7.5 mM MgCl 2 , 0.5 mM dNTP, 2 mM dithiothreitol, and 800 units/ml RNasin. Extension was produced by 400 units/ml Moloney murine leukemia virus reverse transcriptase (Life Technologies, Inc.) for 1 h at 42°C. After ethanol precipitation, products were separated on sequencing gels. S1 Analysis-A single-stranded antisense DNA probe corresponding to the sequence between Ϫ1000 and ϩ25 of EIF4E1 was produced by asymmetric PCR (41) from the recombinant phage 4E1-H using primers 1 and 4 ( Table I). The DNA was electrophoretically separated on 1.2% agarose gels, purified using a Geneclean kit, and then end-labeled with T4 polynucleotide kinase (Promega) and [␥-32 P]ATP (37). Total RNA (ϳ100 g) was hybridized with the probe (50,000 -100,000 cpm) at 30°C for 16 h in 30 l of Hybridization Buffer and then digested with 150 -200 units of S1 nuclease (Sigma) for 1 h at 30°C. Protected DNA fragments were separated on 8% sequencing gels (37).
Ribonuclease Protection Analysis-The sequence between Ϫ312 and ϩ184 of EIF4E2 was amplified from phage 4E2 by PCR using primers 2 and 3 (Table I). PCR was performed using the following program: after an initial heating step (95°C, 10 min), 2 units of Taq DNA polymerase were added to the 100-l reaction, after which 35 cycles of PCR were carried out (95°C, 30 s; 57°C, 30 s; 72°C, 2 min). The amplified PCR product was purified by agarose gel electrophoresis followed by treatment with Geneclean and then subcloned into the pGEM-T vector (Promega). RNA was transcribed in vitro from pGEM-T linearized by EcoRI and radiolabeled with [ 32 P]UTP with a riboprobe transcription system (Promega). A control RNA was transcribed in vitro from Hin-dIII-linearized pTCEEC. Ribonuclease protection assays were performed as described previously (40).
RT-PCR-Total RNA from human placental tissue (100 g) was treated with RNase-free DNase (Promega) at 37°C for 30 min. After inactivation of the DNase by heating at 70°C for 15 min, the RNA was hybridized with primer 3 (Table I) in Hybridization Buffer at 30°C overnight. Primer extension was performed as described above. RNA was removed by DNase-free RNase (Promega), and the remaining DNA from 1 l of the primer extension reaction mixture was amplified by PCR as described above, except that the annealing temperature was 65°C.

RESULTS AND DISCUSSION
Screening of human genomic libraries yielded two genes capable of encoding eIF4E, one of which (EIF4E1) contained introns and one of which (EIF4E2) did not (Fig. 1). Although a previous study reported eIF4E-like genes on chromosome 4 and 20 (42), the fact that most of the fragments in Fig. 1A were cloned from chromosome 4-specific libraries strongly argues that the EIF4E1 gene is on chromosome 4. The entire length of cloned DNA comprising the EIF4E1 gene was 50 kb. The EIF4E2 gene was flanked by three Alu sequences and two 14-bp direct repeats and contained a 3Ј-terminal poly(A) stretch (Fig. 2). All of these features are characteristic of retrotransposons (43). Additional screening yielded DNA fragments representing intronless genes with more substantial differences from the cDNA and containing additional in-frame stop codons. Intron-Exon Structure of EIF4E1-Comparison of the sequence of EIF4E1 with that of the cDNA (35) indicated that the gene is organized into seven exons and six introns (Fig. 1A). The regions in and around the exons were sequenced (Fig. 2). Exon 1 is the smallest (37 bp) and exon 7 is the largest (1.3 kb) ( Table II). The introns of this gene range from 1.2 to more than 10 kb and include only two of the three possible types (44), types 0 and 2 (Table III). All the exon/intron junction sequences conform to the GT/AG rule (44) (Table IV).
The exon structure of EIF4E1 can be compared with the recently published three-dimensional structure of eIF4E (11) with respect to the theory that protein-coding genes evolved by assembly of exons encoding functional domains (45)(46). Results of mass spectrometry and x-ray diffraction suggested that amino acids 1-35 are disordered in the absence of other proteins. These are encoded by exons 1 and 2 (Table II) which may have been added after the evolution of a core cap-binding protein for the purpose of binding other proteins (see Introduction). Exon 3 encodes most of ␤ strand 1 (S1), the 1-2 loop, ␤ strand 2 (S2), and most of ␣ helix 1 (H1). These structurally contiguous elements make up a region of the polypeptide chain that constitutes one side of the protein and include one of the two conserved Trp residues that "sandwich" the 7-methylguanine moiety, Trp-56. Exon 4 encodes essentially only one element, S3. Exon 5 encodes the 3-4 loop, S4, and most of H2. This collection of elements contains the other conserved sandwich Trp residue, Trp-102, as well as Glu-103, which forms hydrogen bonds to the N-1 and amino groups of 7-methylguanine. A similar set of interactions occurs between m 7 GMP and a Type 0 introns occur between two codons; type 1 introns interrupt a codon between the first and second nucleotide, and type 2 introns interrupt a codon between the second and third nucleotides (44).   2. Alignment of the nucleotide sequences of EIF4E1 and EIF4E2. A, the EIF4E1 gene. B, the EIF4E2 gene. The entire sequence for EIF4E2 from Ϫ712 to ϩ1894 is given (see "Experimental Procedures" for numbering system), but the introns have been omitted from the EIF4E1 sequence. Exon-exon junctions in EIF4E1 are marked by vertical lines. The proposed major transcriptional initiation sites of the two genes at Ϫ80 and Ϫ19 are indicated by arrows. The transcriptional initiator region (Inr), translational initiation codons, and polyadenylation signals are boxed. Other consensus motifs within the putative promoter region of EIF4E2 are underlined. Three Alu repeats in EIF4E2 are indicated in boldface, and a 14-bp direct repeat is doubly underlined. The sequence of a 2.7-kb portion of 4E2 has been deposited in GenBank TM /EMBL Data Bank (accession number, M77222).

Comparison of the EIF4E1 and EIF4E2
Genes-Sequence alignment of the exonic portions of the EIF4E1 gene with the entire EIF4E2 gene indicates no differences in the 13-nt region immediately upstream of the ATG, four single base differences in the 651-nt coding region, and five single base differences in the 980-nt 3Ј-untranslated region up to nt 1641 (Fig. 2). In the 3Ј-untranslated region of the EIF4E2 gene there is also a 10-nt insertion of T residues after nt 847 and the absence of a 9-nt stretch from nt 1238 to 1246. Upstream of nt Ϫ13 and downstream of nt 1641 there is no similarity between the two genes. As previously reported (36), the promoter of EIF4E1 lacks a canonical TATA box but includes two consensus sites for c-Myc. The putative promoter of EIF4E2 also lacks a consensus TATA box but, importantly, contains a consensus initiator region (Inr), TCATACC. A strong match to the Inr consensus sequence is commonly seen in TATA-less promoters (48) and may serve as a binding site for the YY1 protein (49). Other consensus motifs within this region include GATA1, AP1, AP4, GFI1, NF-B, STAT, c-Myb, SRY, SREBP1, and HFH2 (Fig. 2). The EIF4E2 gene contains only two of the three polyadenylation signals present in the eIF4E cDNA (35).
Both genes encode proteins of 217 amino acids with only two amino acid differences, Glu versus Lys at position 19 and Arg versus Trp at position 61. The location of the first of these sites in the three-dimensional structure of the protein is not known, but the second occurs at the beginning of S1. Amino acid residues at positions 19 and 61 do not make contact with m 7 GDP. Also, neither falls into the group of amino acid residues previously shown, by site-directed mutagenesis, to be important for cap binding (50 -54). Finally, neither Glu-19 nor Arg-61 is universally conserved in all eIF4E sequences. For these reasons, we predict that eIF4E-2 would be a functional protein.
Transcription Initiation Sites-Previous studies have indicated that transcription for mammalian eIF4E mRNA begins between Ϫ7 and Ϫ27; (i) the cloned human eIF4E cDNA begins at Ϫ18, although since this is a C residue, the mRNA is more likely to begin at the G at Ϫ19 (35); (ii) the cloned mouse eIF4E cDNA begins at Ϫ19 (55); and (iii) primer extension of mRNA from one mouse and one human cell line indicates a major start site at Ϫ16 and minor start sites at Ϫ7, Ϫ24, and Ϫ27 (55). The putative promoter of EIF4E2 bears no similarity to that of EIF4E1 and hence would be unlikely to initiate transcription in the same region. To determine whether EIF4E2 was transcribed, we performed primer extension with a primer complementary to the 5Ј-coding region of eIF4E mRNA, which should therefore produce extension products from both EIF4E1 and EIF4E2 transcripts. Primer extension using human placental RNA and primer 4 (Table I) produced major bands corresponding to initiation sites at Ϫ80 and Ϫ76 as well as minor bands at Ϫ19 and Ϫ20 (Fig. 3A). Primer extension with RNAs isolated from several human cell lines produced similar results (Fig.  3B). HeLa cell RNA (H) gave an initiation site at Ϫ80, the same as placental RNA, but Wish cell RNA (W) also gave minor products corresponding to initiation of transcription at Ϫ56 and Ϫ19. Chang cell RNA (C) yielded much more product corresponding to Ϫ19, less product from Ϫ80, and a small amount of product corresponding to Ϫ46 and Ϫ56. Finally, K562 cell RNA (K) produced even less of the Ϫ80 product, more of the Ϫ46 product, and a great deal of the Ϫ19 product.
To determine the origins of the various primer extension products, we performed S1 and RNase protection analysis using probes based on EIF4E1 and EIF4E2 sequences. S1 mapping with RNA from Wish cells using a probe spanning from Ϫ1000 to ϩ25 of the EIF4E1 gene produced a cluster of fragments corresponding to protection from Ϫ11 to Ϫ14 (Fig. 4A). As the EIF4E1 and EIF4E2 genes have the same sequences immediately downstream of Ϫ13 (Fig. 2), fragments of this size could be derived from either EIF4E1 transcripts initiated in the region of Ϫ11 to Ϫ14 or EIF4E2 transcripts initiated at this point or upstream. This results does, however, indicate that there are no unspliced transcripts from EIF4E1 beginning upstream of this point. It is possible, however, that a transcript is initiated upstream of Ϫ11 to Ϫ14 in EIF4E1 and then spliced, with the 3Ј splice site being located at Ϫ11 to Ϫ14. To test this possibility, we performed Southern analysis of restriction fragments of 4E1-H (Fig. 1A), which contains ϳ20 kb upstream of the coding region, using a probe consisting of the Ϫ80 primer extension product eluted from the polyacrylamide gel. However, no positive hybridization was observed, indicating the lack of an additional upstream exon (data not shown).
We tested directly for transcripts from EIF4E2 by ribonuclease protection analysis of human placental RNA using a radiolabeled RNA probe from Ϫ113 to ϩ184 of the EIF4E2 gene (Fig.  4B, lane 1). A control RNA, transcribed in vitro from the cloned eIF4E cDNA contained in plasmid pTCEEC (38), produced a protected fragment at Ϫ13 as expected (lane 2). Placental RNA  produced a number of protected fragments, the major ones corresponding to the probe terminating at Ϫ80 and Ϫ13 (lane 3). The Ϫ80 fragment is likely to be the same as the Ϫ80 primer extension product (Fig. 3), suggesting that transcription from EIF4E2 begins at Ϫ80. The Ϫ13 protection fragment is likely to have been produced by EIF4E1 transcripts, since the sequence of EIF4E1 and EIF4E2 is common only downstream of Ϫ13.
To confirm expression of the EIF4E2 gene, we performed RT-PCR using primers 3-6 (Table I). Primer 3 was used to make eIF4E-specific cDNA from human placental RNA. The sequence from Ϫ80 to ϩ25 of the EIF4E1 gene can be specifically amplified by PCR using primers 4 and 6, and the sequence from Ϫ66 to ϩ25 of the EIF4E2 gene can be specifically amplified using primers 4 and 5 (Fig. 5A). A product was obtained with primers 4 and 5 (Fig. 5B, lane 3) but not with primers 4 and 6 (lane 6), indicating that an mRNA is present in placental RNA that is initiated at or upstream of the Ϫ66 position of EIF4E2 but not at or upstream of position Ϫ80 of EIF4E1. This experiment was repeated using the Ϫ80 primer extension product of Wish cell RNA as template instead of placental RNA (Fig.  5C). Primers 4 and 5 allowed synthesis of a DNA product of the correct size (lane 2) but primers 4 and 6 did not (lane 4), providing direct proof that the Ϫ80 primer extension product of Fig. 4 is derived from EIF4E2. Finally, the PCR product in Fig.  5C, lane 2, was purified from the agarose gels and sequenced. The sequence matched the 5Ј-region of the EIF4E2 gene (Fig.  2). All of these results are evidence that mRNA transcribed from position Ϫ80 of EIF4E2 is present in human cells.
Additional evidence that these upstream sequences represent bona fide promoter regions came from an analysis of their chromatin structure. Micrococcal nuclease digestion products derived from the upstream regions of EIF4E1 and EIF4E2 were found to be heterogeneous by ligation-mediated PCR, despite the fact that bulk chromatin in the same digests gave rise to typical nucleosome-protected ladders (data not shown). This result is consistent with these regions being sites of nucleosome disruption, a hallmark of transcriptional regulatory regions (56).
The evidence presented here supports the view that both EIF4E1 and EIF4E2 are expressed at the level of mRNA. Two major primer extension products were detected, corresponding to transcription starts at Ϫ80 and Ϫ19 (Fig. 3). The data indicate that EIF4E1 is initiated at Ϫ19 and that EIF4E2 is initiated at Ϫ80. Other examples have been described in which both intron-containing and intronless genes code for the same protein (43,(57)(58)(59), and in at least two cases, both genes yield functional proteins (43,59). It is unclear, however, why human cells would express two functional genes for eIF4E. The fact that the promoter of EIF4E1 contains c-Myc-binding elements (36) whereas that of EIF4E2 does not suggests that the former may be inducibly expressed, e.g. during rapid cell growth, while the latter may be consititutively expressed. The differences in the relative amounts of the Ϫ80 and Ϫ19 primer extension products in various cultured human cell lines (Fig. 3B) suggests that the two genes may be differentially transcribed in response to factors such as growth rate or cell lineage. The differential expression of EIF4E1 and EIF4E2 may also explain why EIF4E2 cDNA was not initially cloned (35); fibroblast and lymphocytes, the sources for mRNA used in the cloning of human eIF4E cDNA, may express EIF4E1 predominantly. Similarly, the cultured cells chosen for primer extension analysis in the previously published study (55) may express EIF4E1 predominantly.
FIG. 4. S1 and RNase protection analysis of EIF4E1 and EIF4E2 transcripts. A, S1 nuclease analysis performed with no RNA (N) or total RNA (R) from Wish cells using a single-stranded DNA probe from the 5Ј-region of the EIF4E1 gene. The numbers refer to nucleotide positions (Fig. 2). C, T, A, and G are a sequencing ladder derived from EIF4E1. B, RNase protection performed with an RNA probe from the 5Ј-region of the EIF4E2 gene and human placental RNA. pTCEEC RNA was transcribed in vitro from a linearized plasmid containing eIF4E cDNA (38) and corresponds to an EIF4E1 transcript. Numbers on the left correspond to nucleotide positions (Fig. 2). Numbers on the right give the sizes in nucleotides of RNA markers (M).  Table I. B, PCR was performed using the indicated primers and templates consisting of human placental total RNA, reverse-transcribed with primer 3 (R), 4E1-H (1), and 4E2 (2). The products were separated on 2% agarose gels. M, DNA markers. C, PCR was performed as in B except that in the indicated lanes (P) the template was the primer extension product terminating at Ϫ80 from Wish cell RNA, excised from a sequencing gel similar to that shown in Fig. 3B. The products were separated on 6% polyacrylamide gels.