JBC Avanti Polar Lipids

HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Originally published In Press as doi:10.1074/jbc.M111134200 on January 30, 2002

J. Biol. Chem., Vol. 277, Issue 15, 12559-12571, April 12, 2002
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
277/15/12559    most recent
M111134200v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bradley, C. A.
Right arrow Articles by Rhoads, R. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bradley, C. A.
Right arrow Articles by Rhoads, R. E.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Mass Spectrometric Analysis of the N Terminus of Translational Initiation Factor eIF4G-1 Reveals Novel Isoforms*

Christopher A. BradleyDagger , Júlio César Padovan§, Timothy L. Thompson, Clint A. Benoit, Brian T. Chait§, and Robert E. RhoadsDagger ||

From the Dagger  Department of Biochemistry and Molecular Biology and  Research Core Facility, Louisiana State University Health Sciences Center, Shreveport, Louisiana 71130-3932 and the § Laboratory for Mass Spectrometry and Gaseous Ion Chemistry, Rockefeller University, New York, New York 10021-6399

Received for publication, November 20, 2001, and in revised form, January 10, 2002

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

In eukaryotes, translation initiation factor 4G (eIF4G) acts as the central binding protein for an unusually large number of proteins involved in mRNA metabolism. Several gene products homologous to eIF4G have been described, the most studied being eIF4G-1. By its association with other initiation factors, eIF4G-1 effects mRNA cap and poly(A) recognition, unwinding of secondary structure, and binding to the 43S initiation complex. Multiple electrophoretic isoforms of eIF4G-1 are observed, and multiple cDNAs have been reported, yet the relationship between the two is not known. We report here a new cDNA for eIF4G-1, present as a previously unidentified human expressed sequence tag, that extends the long open reading frame, provides a new in-frame initiation codon, and predicts a longer form of eIF4G-1 than reported previously. eIF4G isoforms from human K562 cells were cleaved with recombinant Coxsackievirus 2A protease and the N- terminal domains purified by m7GTP-Sepharose chromatography and polyacrylamide gel electrophoresis. Proteins were digested with proteolytic enzymes and peptides masses determined by matrix-assisted laser desorption ionization-time of flight mass spectrometry. In selected cases, peptides were sequenced by electrospray-mass spectrometry fragmentation. This identified the N termini of the three most abundant eIF4G-1 isoforms, two of which had not previously been proposed. These proteins appear to have been initiated from three different AUG codons.

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The three stages of protein synthesis are catalyzed by three groups of proteins: initiation, elongation, and termination factors (1). Initiation is characterized by formation of a series of initiation complexes, each catalyzed by a different subset of initiation factors. Recruitment of mRNA to the 43 S initiation complex to form the 48 S initiation complex involves eIF3,1 PABP, and the eIF4 proteins. eIF3 is a 520-kDa multimer that is required for both Met-tRNAi and mRNA binding. PABP is a 70-kDa protein that specifically binds poly(A) and homo-oligomerizes. The eIF4 factors consist of: eIF4A, a 46-kDa RNA helicase; eIF4B, a 70-kDa RNA-binding and -annealing protein that stimulates eIF4A; eIF4E, a 25-kDa cap-binding protein; and eIF4G, a group of proteins of 154-180 kDa that form specific complexes with all of the other proteins known to be involved in mRNA recruitment.

Proteins sharing eIF4G homology represent the products of at least three different genes in mammalian cells (2-4). One of these has been mapped to the chromosomal location 3q27-qter (5). In accordance with the nomenclature system proposed for translation factors (6), the protein product of the gene at 3q27-qter is referred to as eIF4G-1, whereas its two homologues are eIF4G-2 (also known as p97, eIF4G2, DAP5, and NAT1; Refs. 3, 7, and 8) and eIF4G-3 (also know as eIF4G3, and eIF4GII; Ref. 4). Most studies reported to date have concerned eIF4G-1.

eIF4G appears to serve as a nucleation site for co-localization of an unusually large number of proteins involved in mRNA metabolism. Combining data from yeast, plant, and mammalian eIF4Gs, these include (in approximate order of binding sites on eIF4G from N to C termini): the influenza protein NS1 (9), the cytoplasmic poly(A)-binding protein PABP (10, 11), the rotavirus protein NSP3 (12), the decapping protein Dcp1 (13), the cytoplasmic cap-binding protein eIF4E (14-16), the nuclear cap-binding protein CBP80 (17, 18), the initiation factor eIF3 (14, 19), the RNA helicase eIF4A at two distinct sites (14, 20, 21), RNA itself (22-24), the heat shock proteins hsp27 (25) and hsp70 (26), and the eIF4E kinase Mnk1 (27-29). In some cases it has been shown that eIF4G-1 not only binds proteins but also affects their activities or binding of other proteins (19, 21, 30).

The mRNA recruitment step is rate-limiting for initiation under normal cellular conditions and appears to be highly regulated (31, 32). The best studied regulatory mechanism involving eIF4G is its cleavage by 2A protease of entero- and rhinoviruses and L protease of foot-and-mouth-disease virus. Upon infection of mammalian cells with these picornaviruses, most host protein synthesis is shut off coincident with the appearance of viral proteins (33). This is thought to be mediated by a switch from cap-dependent to cap-independent translation. Complexes containing eIF4G restore cap-dependent translation in lysates of poliovirus-infected cells (34-36). eIF4G was discovered as a result of its proteolysis coincident with the loss of cap-dependent initiation during poliovirus infection (37). eIF4G is separated into two functional domains, an N-terminal cleavage product (cpN) that binds eIF4E and PABP, and a C-terminal cleavage product (cpC) that binds eIF4A and eIF3 (14, 38). Initiation of picornaviral mRNA translation is via an internal ribosome entry site (39, 40). Cleavage of eIF4G drastically inhibits translation of capped mRNAs in vitro, whereas internal initiation and initiation of uncapped mRNAs are either unaffected or even stimulated (41-43).

Multiple isoforms of eIF4G-1 are observed by SDS-PAGE, but the origin of these is not known. However, an analysis using 2A protease cleavage, SDS-PAGE, and antibodies directed against different domains of eIF4G suggested that the heterogeneity was attributable to the ~50-kDa cpN domain (38). Digestion with L protease further delimited the source of heterogeneity to the N-terminal ~30 kDa cpN1 domain (14). Multiple cDNAs for human eIF4G-1 have also been reported. The initial cloning, based on two overlapping cDNAs from fetal and adult brain, predicted a protein of 154 kDa (2). Subsequent cloning revealed a cDNA corresponding to a protein with an additional 156 aa at the N terminus (44). An initiation codon was proposed based on alignment with eIF4G-3 and lack of any further upstream cDNA sequence. Further cloning revealed a cDNA that was 42 nt longer (45), indicating either that multiple mRNAs exist or that Imataka et al. (44) had not reached the 5'-end of the same mRNA. Finally, a fourth and fifth cDNA were reported (46). One of these independently confirmed some of the sequence reported by Johannes and Sarnow (45) but did not extend the cDNA. The other was collinear with the other cDNAs up to a point, upstream of which it deviated, suggesting a splice variant. None of these cDNAs provided a new in-frame AUG. Thus, the most upstream AUG codon for all five eIF4G-1 cDNAs reported to date is that originally proposed by Imataka et al. (44).

The present study was motivated by the fact that the relationship between the multiplicity of electrophoretic forms of the protein and the multiplicity of cDNAs is not known. Importantly, despite speculation based on cDNA sequences, the N terminus has not been established for any eIF4G-1 isoform. It is important to establish the actual protein structures, since eIF4G-1 isoforms differing at the N terminus may contain different binding sites for proteins involved in translational control. This study identifies an EST in the public databases as corresponding to a longer form of eIF4G-1 mRNA. Because it predicts a new upstream, in-frame AUG, it could encode an even longer isoform of eIF4G-1 than those predicted from previous cDNA sequences. Mass spectrometric analysis confirmed that this was the case and also established the structure of two other eIF4G-1 isoforms.

    EXPERIMENTAL PROCEDURES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Materials-- Porcine trypsin was obtained from Promega (catalog number V5111). Endoproteinase Lys-C (catalog number P3428) and bovine alpha -chymotrypsinogen A (catalog number C4879) were obtained from Sigma. Arg-C from Clostridium histolyticum was purchased from Worthington (catalog number LS001641). Asp-N (catalog number 1420488) and Complete, Mini, Protease Inhibitor Cocktail Tablets (catalog number 1836153) were obtained from Roche Molecular Biochemicals. m7GTP-Sepharose 4B was purchased from Amersham Biosciences.

Identification of a New EST for eIF4G-1-- As noted above, Bushell et al. (46) reported a novel cDNA related to eIF4G-1, GenBankTM nucleotide accession number AF002815. The hypothetical protein encoded by this cDNA was assigned GenBankTM protein accession number AAC78442. We compared the cDNA sequence corresponding to aa residues -24 to +49 of this protein2 to the est_human database using the program blastn found under the NCBI BLAST suite of programs. We obtained a new EST derived from an adult melanoma: GenBankTM nucleotide accession number AL120751. The 215-nt segment from nt 25-240 of AL120751 exactly matched the 215-nt segment from nt 4-219 of AF002815.

The sequence reported for AL120751 covered only the most 5'-terminal 627 nt, even though the complete eIF4G-1 cDNA is predicted to contain 5510 nt (2, 44, 45). We obtained the plasmid corresponding to this EST, termed Homo sapiens cDNA clone DKFZp762O191, from the German Genome Project and determined additional DNA sequence information using the DNA sequencing facility at Iowa State University. The 3'-end of AL120751 (nt 4647-5306, using a composite numbering system)3 was determined using an oligo(dT) 3'-A primer. A 3'-terminal poly(A) tract of 111 nt was observed using a primer corresponding to the SP6 promoter in the vector. A sequence covering nt 439-1155 was obtained using the sense primer 5'-AACACGCCTTCTCAGCCCCGC-3'. This corrected an entry of "N" at nt 456 to G in the GenBank record of AL120751. A sequence covering nt 710-1399 was obtained using the antisense primer 5'-GGGGCAAGCTGGGGGAGGAGC-3'. This sequence exactly matched nt 328-1018 of cDNA AF104913, which encodes protein AAC82471. Finally, sequences covering nt 1-300 and 97-793 were obtained using the sense primer 5'-CGCCACGGCCGAAGCAGCTAG-3' and antisense primer 5'-AACACGCCTTCTCAGCCCCGC-3', respectively. Overall, this extended the sequence information for cDNA clone DKFZp762O191 by 1544 nt.

Generation of K562 Cell Lysates-- Human K562 cells were grown in RPMI medium (Invitrogen) containing 10% fetal bovine serum (Atlanta Biologicals, Norcross, GA). Cultures were grown to confluence in 175-cm2 tissue culture flasks maintained in a humidified, 5% CO2 environment at 37 °C. A standard preparation of lysate was derived from 3.2 × 109 cells (16 T-175 flasks). Cells were resuspended on ice for 30 min in an equal volume of Buffer A (1% Triton X-100, 10 mM Tris-HCl, pH 7.4, 150 mM NaCl, 5 mM EDTA, 1 tablet of protease inhibitor mixture per 20 ml buffer) and then centrifuged at 25,000 × g for 20 min. The supernatant was frozen in liquid N2 and stored at -80 °C.

Immunological Procedures-- Two antibodies against different regions of eIF4G-1 were used.4 Anti-Peptide 7 antibodies were obtained as described previously (2). Anti-Peptide 10 antibodies were produced and affinity purified using the peptide CRAQPPSSAASR, which corresponds to aa residues 55-65 of the consensus eIF4G-1 sequence2 with an added N-terminal Cys residue, as described previously (2). Immunoblotting was carried out as described previously (47). Incubation with the anti-Peptide 7 antibodies (1:1000) was carried out at room temperature for 1 h. Incubation with affinity-purified anti-Peptide 10 antibodies (1:50) was at 4 °C overnight. Both antibodies were visualized with alkaline phosphatase-conjugated goat anti-rabbit IgG antibodies (Vector Laboratories, Burlingame, CA) at 1:1000.

Purification of cpN-- Recombinant 2A protease from Coxsackievirus serotype B4 was prepared as described previously (41). K562 cell lysates were cleared after thawing by spinning at 25,000 × g for 20 min at 4 °C. The supernatant was incubated with recombinant 2A protease at a final concentration of 50-100 µg/ml for 30 min on ice. (As 2A protease preparations differed somewhat in activity, the amount was adjusted to produce complete cleavage of eIF4G-1, based on western blotting with anti-Peptide 7 antibodies.) The 2A protease-treated lysate was then subjected to m7GTP-Sepharose affinity chromatography as described previously (48) but with the following modifications. One standard batch of K562 lysate was gently rotated with 2 ml of m7GTP-Sepharose for 1 h at 4 °C. The slurry was then transferred to a column and the flow-through fraction was reapplied to the column. The column was washed first with 12 volumes of Buffer B55 (20 mM MOPS, pH 7.6, 10% (w/v) glycerol, 0.5 mM EDTA, 0.25 mM dithiothreitol, 25 mM NaF, 55 mM KCl) and then with 3 volumes of 100 µM GTP in Buffer B55. Proteins were eluted with 4 volumes of 200 µM m7GTP in Buffer B55. The eluate was frozen in liquid N2 and stored at -80 °C.

Electrophoretic Separation of cpN Isoforms-- Linear polyacrylamide was prepared (49) and added to each cpN sample at a final concentration of 120 µg/ml as carrier. Ice-cold 100% trichloroacetic acid was added to a final concentration of 10% and the sample allowed to stand on ice for 30 min. The precipitate was collected by centrifugation at 25,000 × g for 20 min, washed four times with ice-cold 80% aqueous acetone, and dissolved in 1× SDS-loading buffer (adjusted to pH 8.2; Ref. 50). Alkylation of Cys residues was performed as described previously (50). The sample was then adjusted to pH 6.8 with HCl, and the cpN isoforms were separated by electrophoresis on a 16 × 20-cm gradient gel (8-15%; acrylamide:N,N'-bisacrylamide, 30:0.8) with a 4% stacking gel. Electrophoresis was carried out a constant current of 16 mA for 4 h followed by 24 mA for 16-20 h. Protein bands were stained with Coomassie Blue, excised, and stored at -80 °C.

Protease Digestion-- Gel pieces were minced and further destained with three washes (400 µl each) of 50% acetonitrile, 25 mM ammonium bicarbonate, pH 8.0. The polyacrylamide was dehydrated for 5 min with 100% acetonitrile followed by vacuum centrifugation in a Savant Speedvac for 30 min. In-gel digestion with trypsin was performed overnight at 37 °C by the addition of 15 µl of enzyme (10 µg/ml) in 25 mM NH4HCO3, pH 8.0, to each gel piece. For Lys-C digestion, a stock enzyme solution of 50 µg/ml was made in 0.1 M Tris-HCl, pH. 8.0, and 15 µl were added to each gel piece. Digestion was performed overnight at 37 °C. For Arg-C digestion, the enzyme was reconstituted at 2 mg/ml in 1 mM CaCl2, 2.5 mM dithiothreitol. In-gel digestion was carried out at room temperature overnight in 15 µl of 50 mM sodium phosphate, pH 7.6, 2.5 mM dithiothreitol by the addition of 3 µg enzyme per gel slice. Finally, for Asp-N digestion, the enzyme was reconstituted in H2O to give a concentration of 40 µg/ml in 10 mM Tris, pH 7.5. In-gel digestion was carried out overnight at 37 °C in 15 µl of 50 mM sodium phosphate, pH 8.0, by the addition of 4 µl of enzyme per gel slice.

Mass Spectrometry-- Mass spectrometric analysis was performed at both the LSUHSC-S Research Core Facility or the Laboratory for Mass Spectrometry and Gaseous Ion Chemistry, Rockefeller University. At LSUHSC-S, MALDI-TOF-MS was performed on a PerSeptive Biosystems Voyager-DE PRO Biospectrometry work station. At Rockefeller University, two instruments were used. Peptide mapping by MALDI-TOF-MS was performed on a PerSeptive Biosystems Voyager-DE STR Biospectrometry Work station. Sequence information was obtained by LC-ESI-MS/MS on a Finnigan LCQ-DECA ion trap mass spectrometer.

Peptides were prepared for MALDI-TOF-MS after proteolytic digestion by extraction from gel pieces twice with 50-µl portions of 50% acetonitrile, 5.0% trifluoroacetic acid. Peptides in the extract were dried, dissolved in 15 µl of 0.1% trifluoroacetic acid, and purified on a ZipTip (Millipore). They were eluted with 2 µl of the appropriate organic acid (matrix) dissolved in 50% acetonitrile, 0.1% trifluoroacetic acid, and spotted on a MALDI plate. For masses in the 800-6000 Da range, the matrix was a 0.01 mg/ml solution of alpha -cyano-4-hydroxy-trans-cinnamic acid (Sigma, catalog number C-2020). For masses in the 6000-26,000 Da range, the matrix was a 0.01 mg/ml solution of sinapinic acid (3,5-dimethoxy-4-hydroxy-trans-cinnamic acid; Aldrich, catalog number D13,460-0). When internal calibration was used, the eluting solution also included peptide mass standards. Data were summed over 50-100 acquisitions in delayed extraction mode.

At LSUHSC-S, data analysis was performed using the Data Explorer software, version 3.5-4.0. At Rockefeller University, analysis was performed using the software program M over Z from Proteometrics, LLC. In both cases, spectra were subjected to algorithms for base-line correction and noise removal at two S.D. values. Throughout the current report, monoisotopic masses are reported for <2500-Da peptides, whereas average masses are reported for >2500-Da peptides. Theoretical masses were determined using the Peptide Mass tool at the ExPASy Proteomics website (ca.expasy.org/tools/). Peaks seen in both the sample of interest and also in a blank gel treated identically were eliminated from further consideration. For automatic matching of observed to predicted peptide masses, we used Auto-MS Fit, version 1.2.18 (PerSeptive Biosystems). For manual matching, we considered that peptides matched if their masses were within 1 Da for the range 800-10,000 Da. Because of unresolved microheterogeneity, caused by Met oxidation for example, peptides with molecular masses above 10,000 Da were considered to be matches if the experimentally determined masses were within 0.6% of the calculated values.

LC-ESI-MS/MS analysis was carried out using a Smart System (Amersham Biosciences) equipped with 10-ml syringe pumps and a pre-column flow splitter (Michrom Bioresources, Auburn, CA). The chromatographic eluate was monitored by on-line mass spectrometry using an electrospray ion trap mass spectrometer, model LCQ-DECA (Finnigan ThermoQuest, San Jose, CA). Peptide mixtures were diluted 10-fold with 0.01% trifluoroacetic acid (v/v) in water/methanol/acetic acid (949:50:1, v/v/v) and loaded on a reverse phase Magic C18 column (50 × 0.2 mm inside diameter; pore size, 100 Å; particle size, 5 µm) from Michrom Bioresources (Auburn, CA). Peptide separation was performed at room temperature with a fast-rising methanol gradient (in 5 min) at a flow rate of 2.8 µl/min. The eluate was transferred through a 50-µm inside diameter fused silica capillary from the column to the ion source of the mass spectrometer and electrosprayed at 2.8-3.2 kV. The transport capillary in the mass spectrometer was kept at 130-150 °C in order to assist desolvation.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Identification of a New EST for eIF4G-1-- As noted above, several cDNAs for eIF4G-1 have been reported (2, 44-46). Four of the polypeptides theoretically encoded, NP_004944, AAC82471, AAC78443, and eIF4G-Iext, are collinear, but one of them, AAC78442, deviates from the others upstream of aa 50 (Fig. 1). In some cases, the polypeptide sequences deposited in GenBankTM represent those encoded by the longest open reading frame following an AUG codon (NP_004944, AAC82471), but in other cases, the polypeptide sequence continues in the N-terminal direction despite the absence of an AUG codon (AAC78442, AAC78443, eIF4G-Iext). We used sequence information from the cDNA with the longest open reading frame (GenBankTM nucleotide accession number AF002815, which encodes GenBankTM protein accession number AAC78442; see Fig. 1) to search for additional ESTs. The result was a new EST entered in GenBankTM under nucleotide accession number AL120751 (Fig. 1). The EST sequence reported (627 nt) was not previously identified as corresponding to eIF4G-1 mRNA, but a comparison with other eIF4G-1 cDNAs indicated an identical sequence over 215 nt.


View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1.   Alignment of polypeptide sequences derived by theoretical translation of cDNAs and an EST corresponding to eIF4G-1. Protein ID generally refers to the GenBankTM protein accession numbers. One exception is AL120751, which is the GenBankTM nucleotide accession number for the sequence of an EST. The sequence shown is the conceptual translation product in the +2 reading frame. The other exception is eIF4G-Iext, which refers to the sequence published in Ref. 45. Met residues are shaded. The underlined aa residues (55-65) indicate the epitope used to generate anti-Peptide 10 antibodies. Names for hypothetical forms of eIF4G-1 appear below potential N-terminal Met residues. AAC78442, AAC78443, and AL120751 do not continue toward the C terminus, because they are derived from only partial cDNA sequences. The aa residue 61 shown for AL120751 differs from the GenBankTM entry based on sequencing reported in the present work (i.e., ANT right-arrow AGT, encoding Ser). The aa residue 214 shown for NP_004944 differs from the GenBankTM record because of resequencing (52, 67) of the cDNA (AGT right-arrow GGT, encoding Gly instead of Ser).

We obtained the plasmid corresponding to this GenBankTM entry from the German Genome Project. Sequencing portions of the plasmid insert from both the 5' and 3' termini revealed: (i) agreement over the entire EST sequence for AL120751 except for a single nucleotide sequencing ambiguity at nt 456; (ii) preservation of reading frame between the 3'-end of the EST and other eIF4G-1 cDNA sequences; (iii) the presence of a 111-nt poly(A) tract; (iv) collinearity with the cDNAs encoding AAC82471, AAC78443, and eIF4G-Iext (Fig. 1); and (v) partial collinearity with the cDNAs encoding NP_004944 and AAC78442 (see "Experimental Procedures"). If the new cDNA is collinear with these cDNAs throughout its entire length, it corresponds to an mRNA of 5510 nt.3

Surprisingly, the polypeptide encoded by AL120751 deviates from the AAC78442 polypeptide upstream of aa 50 (Fig. 1), even though sequences from the cDNA encoding it, AF002815, were used to find AL120751. Further comparison of AL120751, AF002815, and the corresponding human genomic sequence (GenBankTM nucleotide accession number AC078797.8; gi number 15887175; nt 122674-133235) indicates that AF002815 lacks two exons that are present in AL120751 (data not shown). The most 5' exon, however, is common to AF002815 and AL120751. The absence of these two internal exons from AF002815 results in a different reading frame upstream of aa 50. On this basis, we suggest that AF002815 represents a splice variant that lacks internal exons present in the AL120751 cDNA.

The new polypeptide encoded by AL120751 is collinear with AAC82471, AAC78443, eIF4G-Iext, and NP_004944, except for the Arg in the first position of AAC78443 (aa 30 using the common numbering system2), which is Pro in eIF4G-Iext and AL120751 (Fig. 1). We assume this difference is due to an incomplete codon at the junction between the AAC78443 cDNA and the vector or a sequencing error. AL120751 predicts the longest eIF4G-1 protein reported to date, adding 22 aa residues to the eIF4G-Iext sequence. Furthermore, it provides a new potential translational initiation site. Upstream of the AUG that encodes Met-1 of AL120751 are both in-frame and out-of-frame stop codons. These features reduce the likelihood that a start codon even further upstream could be utilized.

Immunological Analysis of eIF4G-1 Electrophoretic Variants-- To determine whether any of the multiple bands of eIF4G-1 typically seen by SDS-PAGE contain N-terminal sequences predicted by these cDNAs, we developed an anti-peptide antibody (anti-Peptide 10) against aa 55-65 (underlined in Fig. 1).2,4 Both full-length eIF4G and cpN were separated by analytical gradient gel SDS-PAGE (Fig. 2). By silver staining, three partially resolved, major bands are observed at apparent molecular mass approx  220 kDa in full-length eIF4G-1, as well as numerous minor bands (lane 2). Three major and several minor bands are also present in cpN (lane 1). Based on prior studies using various antibodies (14, 38, 51, 52), these major bands in the cpN preparation are the N-terminal domains of the major proteins in the eIF4G preparation.


View larger version (31K):
[in this window]
[in a new window]
 
Fig. 2.   Immunological analysis of full-length eIF4G-1 and cpN. Total protein from K562 cells was subjected to m7GTP-Sepharose affinity chromatography followed by analytical gradient gel SDS-PAGE. Prior to chromatography the lysate was either untreated (lanes 2, 4, and 6) or treated with 2A protease (lanes 1, 3, and 5). Protein was detected by silver staining (lanes 1 and 2) immunoblotting with anti-peptide 7 (lanes 3 and 4) or anti-peptide 10 (lanes 5 and 6) antibodies.

Some of the major bands in full-length eIF4G-1 detected by silver staining also reacted with the highly sensitive anti-Peptide 7 antibodies (lane 4), which were developed against eIF4G-1 aa residues 523-538 (2). As with the silver-stained preparation, the ~220-kDa cluster of proteins (lane 4) is not present in the cpN preparation (lane 3). Instead, cpN contains three major and several minor bands that are reactive with anti-Peptide 7 antibodies and migrate the same as Bands 1-3 of the silver-stained gel (cf. lanes 3 and 1). Most of the bands not reacting with this antibody are identified below by mass spectrometry. By contrast, with the new anti-Peptide 10 antibody, only the two slowest migrating forms were detected, whether in full-length eIF4G (lane 6) or cpN (lane 5). This suggests that only the two slowest forms contain aa 55-65 (Fig. 1).

Separation of cpN Isoforms-- The yield of eIF4G on m7GTP-Sepharose affinity chromatography, as measured with a monoclonal antibody, is higher if the eIF4G is first cleaved by poliovirus infection (36). Furthermore, as shown in Fig. 2, cpN isoforms were separated better by electrophoresis than isoforms of full-length eIF4G. We therefore chose to separate isoforms of cpN rather than intact eIF4G. We attempted two-dimensional gel electrophoresis of cpN, but the bands were extremely broad and failed to focus in the first dimension. This behavior has been previously observed (53). Instead, we obtained optimal resolution of cpN by one-dimensional SDS-PAGE on preparative (16 × 20 cm) gradient gels.

Strategy for Assigning eIF4G-1 Isoforms to Bands-- Application of anti-Peptide 7 and anti-Peptide 10 antibodies provided only limited information about the structures of the polypeptides making up the various electrophoretic bands. Initially, we attempted to identify them by N-terminal sequence analysis. The various cpN isoforms were separated by SDS-PAGE, blotted onto a polyvinylidene difluoride membrane, and analyzed by Edman degradation at the Macromolecular Structure Analysis Facility of the University of Kentucky. However, this failed to yield interpretable data, presumably due to blocked N termini.

We therefore turned to sequence information provided by cDNAs and ESTs. Five of the cDNA sequences (AAC82471, AL120751, AAC78443, eIF4G-Iext, and NP_004944) are theoretically translated into collinear polypeptide sequences. Due to the existence of multiple AUG codons, six eIF4G-1 isoforms can be predicted that correspond to alternative translational initation sites (Fig. 1). We termed these hypothetical isoforms eIF4G-1a, -1b, -1c, -1d, -1e, and -1f (Table I). The proteolytic peptides of each eIF4G-1 isoform can be predicted from the composite cDNA sequence (Fig. 3). It is also possible to determine the masses of peptides actually present in proteolytic digests from the various electrophoretic bands by MALDI-TOF-MS (54). Putting information on theoretical and actual peptides together, we can deduce which hypothetical isoforms are consistent with the peptide pattern of each electrophoretic band. The masses of "diagnostic" peptides, i.e. those that are unique to the various hypothetical eIF4G-1 isoforms, are shown in Table II.

                              
View this table:
[in this window]
[in a new window]
 
Table I
Proposed names, descriptions, and assignments of hypothetical eIF4G-1 isoforms


View larger version (19K):
[in this window]
[in a new window]
 
Fig. 3.   Predicted peptides from digestion of hypothetical eIF4G-1 isoforms with trypsin, Arg-C, Lys-C, and Asp-N. Peptides shown cover the cpN region of eIF4G-1 only. Tryptic peptides are numbered from N to C terminus using normal Arabic numerals (Peptides 1, 2, etc.). Lys-C peptide names are preceded by "k" (Peptides k1, k2, etc.). Arg-C peptide names are preceded by "r" (Peptide r1, r2, etc.). Asp-N peptide names are preceded by "d" (Peptide d1, d2, etc.). Peptides that form the N terminus of one of the hypothetical eIF4G-1 isoforms (Table I) are shaded. The suffixes C, C1, C2, etc. denote truncated N-terminal peptides (e.g. Peptide 2C is the C-terminal portion of Peptide 2). The initial aa sequences of predicted N-terminal peptides, prior to any posttranslational modification, are shown above the corresponding tryptic peptides (MNK, MNT, etc.).

                              
View this table:
[in this window]
[in a new window]
 
Table II
Predicted masses of diagnostic peptides that distinguish between hypothetical forms of eIF4G-1

Initial Characterization of Protein Bands in cpN Preparations by MALDI-TOF-MS-- To determine which bands present in cpN preparations were actually related to eIF4G-1, we subjected a preparation of cpN to electrophoresis and staining with Coomassie Blue (Fig. 4). This pattern differs from that of Fig. 2, lane 1, for two reasons. First, higher resolution was achieved on the preparative gel (Fig. 4) than the analytical gel (Fig. 2). Second, staining with Coomassie Blue (Fig. 4) is roughly proportional to protein quantity, but staining with silver (Fig. 2) is disproportionately strong for cpN polypeptides (38), perhaps because of several polyglutamic acid stretches in the sequence. Protein bands were excised, digested with endoproteases, and the resulting peptides analyzed by MALDI-TOF-MS (see "Experimental Procedures"). eIF4G-1 peptides were found in band b, band c plus band d (partially resolved), band e, and band g only, which correspond to Bands 1, 2, 3, and 4 in Fig. 2, respectively. These assignments, based on observed eIF4G-1 peptides, also agree with immunoblotting results (Fig. 2 as well as Refs. 14, 38, 51, and 52).


View larger version (18K):
[in this window]
[in a new window]
 
Fig. 4.   Identification of eIF4G-1 and and other affinity-purified polypeptides by matching peptide masses to data bases. 2A protease-treated K562 lysate was subjected to m7GTP-Sepharose affinity chromatography as described under "Experimental Procedures." Bound proteins were then resolved by preparative gradient gel SDS-PAGE and stained with Coomassie Blue. Protein bands (arbitrarily labeled a thru n) were excised, digested with trypsin, and subjected to MALDI-TOF-MS. The spectra were automatically matched to the NCBI protein data base by the Auto-MS Fit program, as described under "Experimental Procedures." Identities and confidence scores (MOWSE; Ref. 68) were generated for each band. The human proteins that best matched the indicated bands are: band a, hypothetical protein, GenBankTM protein accession number T17345, MOWSE = 2.84 × 103; band b (same as Band 1 of Fig. 2), eIF4G-1, GenBankTM AAC78443, MOWSE = 6.85 × 102; band c (same as Band 2 of Fig. 2), LINE1 reverse transcriptase homologue, GenBankTM P08547, MOWSE = 3.19 × 102; subsequent analysis of this band revealed eIF4G-1 peptides as well; band d, p110 subunit of eIF3, GenBankTM NP_003743, MOWSE = 2.61 × 105; band e (same as Band 3 of Fig. 2), no protein matched by Auto-MS Fit, but subsequent analysis revealed eIF4G-1 peptides; band f, no protein matched by Auto-MS Fit; band g (same as Band 4 of Fig. 2), no protein matched by Auto-MS Fit, but subsequent analysis revealed eIF4G-1 peptides; band h, no protein matched by Auto-MS Fit; band i, BiP, GenBankTM AAF13605, MOWSE = 1.09 × 104; band j, PABP, GenBankTM NP_002559, MOWSE = 8.29 × 103; band k, DEAD-box protein p72, GenBankTM NP _006377, MOWSE = 1.63 × 103; Band l, HSP70, GenBankTM NP_005336, MOWSE = 3.22 × 105; band m, RNA helicase p68, GenBankTM NP_004387, MOWSE = 5.94 × 103; band n, p36 subunit of eIF3, GenBankTM NP_003748, MOWSE = 7.82 × 104.

Some non-eIF4G-1 proteins present in the cpN preparation were potentially of interest. In addition to eIF4G-1, band c contained peptides that matched a LINE1 reverse transcriptase homologue. Bands d and n matched the p110 and p36 subunits of eIF3, respectively (55). As noted above, several heat-shock proteins bind to eIF4G, so it is interesting to note that band i is the endoplasmic reticulum-resident chaperone BiP, and Band l is hsp70-1. Band j is PABP, known to associate with eIF4G (10, 11). Finally, whereas the 46-kDa DEAD-box helicase eIF4A is known to bind eIF4G-1 at two sites in cpC (14, 20, 21), it is interesting to note that band k is a 72-kDa DEAD-box protein, and band m is a 68-kDa RNA helicase.

Band 1 Corresponds to eIF4G-1f-- To correlate hypothetical eIF4G-1 isoforms (Table I) with electrophoretic bands (Fig. 2), we examined the eIF4G-1-containing bands in more detail with four different proteolytic enzymes over several mass ranges. A typical spectrum for Band 1 (which corresponds to band b in Fig. 4) in the range m/z = 500-6000 is shown in Fig. 5. Of the tryptic peptides predicted to arise from eIF4G-1 (Fig. 3), those with masses similar to peaks observed in Fig. 5 are listed in Table III. The average deviation of observed versus theoretical masses was 0.19 Da. The major peaks (other than calibrants) matched predicted eIF4G-1 peptides, indicating that the sample was not grossly contaminated. Peptides below 800 Da were excluded from further consideration because of high noise levels and paucity of significant sequence information. In addition to the tryptic peptides listed in Table III, we detected in other spectra Peptides 11·12 (missed cleavage), 13·14, 16, 29·30·31, 30·31, and 37·38, some of which were either singly or doubly oxidized in Met residues (see below).


View larger version (12K):
[in this window]
[in a new window]
 
Fig. 5.   MALDI-TOF-MS analysis of tryptic peptides from Band 1 of a cpN preparation. Band 1 (see Fig. 2) from a preparative gradient gel (same as band b in Fig. 4) was subjected to trypsin digestion and MALDI-TOF-MS as described under "Experimental Procedures." Peaks attributable to eIF4G-1 peptides are indicated using the numbering system in Fig. 3. Composite peptides resulting from missed tryptic cleavages are shown with a center dot, e.g. Peptide 11·12·13. Peptides in which Met residues are oxidized have the suffixes ox, ox1, and ox2 (see Table III). Peaks resulting from peptide standards used for internal calibration are indicated as cal1, cal2, etc. For each peak, m/z values are compared with predicted mass values in Table III.

                              
View this table:
[in this window]
[in a new window]
 
Table III
Matching of peaks from Band 1 in the range of m/z = 800-6000 to theoretical eIF4G-1 tryptic peptides

Diagnostic tryptic peptides for Band 1 are shown in Fig. 6A (upper panel). Band 1 has peaks at m/z = 4534.7 and 4550.8, which are within 0.5 Da of the calculated masses of Peptide 2 (Table II) and the Met-oxidized form of Peptide 2 (+16.0 Da). This "one-Met signature" agrees with the presence of one Met residue in Peptide 2 (Fig. 3 and Table II) and contributes to the confidence of this assignment. Other peaks in Fig. 6A for Band 1 at m/z = 4629.7, 4645.9, and 4662.1 are within 0.6 Da of the predicted masses of Peptide 5 (Table II) and its derivatives containing either one or two oxidized Met residues, respectively. This "two-Met signature" agrees with the presence of two Met residues in Peptide 5 (Fig. 3 and Table II). Finally, the peak at m/z = 4728.6 matches Peptide 28·29 (Fig. 3 and Table III). In contrast, the spectrum of Band 2 contains peaks corresponding to Peptide 5 but not Peptide 2 (Fig. 6A, lower panel), nor were peaks corresponding to Peptides 2 or 5 observed in Bands 3 or 4 (not shown). Peptide 5 can come from eIF4G-1e or eIF4G-1f, but not from eIF4G-1a, -1b, -1c, or -1d (Fig. 3 and Table II). However, Peptide 2 can only come from eIF4G-1f (or an undescribed, longer form not predicted from the cDNA sequence). Thus, the presence of Peptide 2 in Band 1 but not in Bands 2, 3, or 4 is consistent with eIF4G-1f being present only in Band 1. 


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 6.   Identification of diagnostic peptides in Bands 1 and 2. Peptides produced from Bands 1 and 2 (Fig. 2) with four endoproteases were analyzed by MALDI-TOF-MS. Peaks marked with an asterisk arise from MALDI matrix adduction (69). A, peaks for tryptic peptides from Bands 1 and 2. Closed arrows (upper panel) indicate unmodified and Met-oxidized forms of Peptide 2. Open arrows (upper and lower panels) indicate unmodified and Met-oxidized forms of Peptide 5. B, peaks for Lys-C peptides from Bands 1 and 2. The closed arrow (upper panel) indicates Peptide k2. C, peaks for Arg-C peptides from Band 1. The arrows denote the Nalpha -acetylated, des-Met derivative of Peptide r1 and the Met-oxidized form. D, Asp-N peptides from Bands 1 and 2. Peptide d1 is present only in Band 1 (upper panel), while Peptide d1C1 is present only in Band 2 (lower panel). An internal calibrant peak for the protein standard horse apomyoglobin (16,952 Da) is present in both upper and lower panels of B and D.

Confirmation that eIF4G-1f is in Band 1 was obtained with endoprotease Lys-C. When this enzyme was used with Band 1, we detected Peptides k2, k4, k7, k12, k13, k15, and k21 (Fig. 3). A diagnostic peak at m/z = 17,770 was observed in Band 1 but not Band 2 (Fig. 6B). Peptide k2 has a predicted, unmodified mass of 17,695 Da (Table II), which is 75 Da less than the peak observed in Fig. 6B. Since Peptides 2 and 5 exist primarily in the oxidized forms (Fig. 6A), it is likely that the peak at 17,770 consists of a mixture of Peptide k2 and derivatives oxidized at one to four Met residues (calculated masses: 17,711, 17,727, 17,743, and 17,759 Da). The observed peak is within 0.06% of the latter calculated mass. These results indicate that Band 1 contains Peptide k2, which can only arise from eIF4G-1f (or a hypothetical larger form), whereas Bands 2, 3, and 4 do not contain Peptide k2.

Unfortunately, the tryptic and Lys-C peptide predicted to represent the extreme N terminus of eIF4G-1f, MNK, is too small to be unambiguously identified by our approach. Thus, we cannot conclude from the data in Fig. 6, A and B, whether Peptides 2 and k2 were contributed by eIF4G-1f or a hypothetical larger form. We therefore turned to a different enzyme, Arg-C, which produced a diagnostic peak at m/z = 4819.3 (Fig. 6C). The mass of Peptide r1, which is predicted to result from Arg-C digestion of eIF4G-1f, is 4908.6 Da (Table II). However, the mass of a derivative of Peptide r1 in which the N-terminal Met is removed and the resulting N-terminal Asp is Nalpha -acetylated (+42.0 Da) is 4819.4, within 0.1 Da of the observed peak. This agrees with our observation that the same band was refractory to analysis by Edman degradation (see above). A second peak at 4835.6 matches the mass of Met-oxidized, Nalpha -acetylated Peptide r1 in which the N-terminal Met is removed. This one-Met signature is consistent with the postulated removal of the N-terminal Met-1 and oxidation of Met-41 (Fig. 1). The peaks at m/z = 4819.3 and 4835.6 were observed in Band 1 but not Bands 2-4 (data not shown), which agrees with the hypothesis that eIF4G-1f is in Band 1 but not in the other bands.

To obtain additional evidence for eIF4G-1f in Band 1, we digested with Asp-N. A major peak with m/z = 19,371 was observed in Band 1 (Fig. 6D, upper panel) but not Band 2 (Fig. 6D, lower panel), Band 3, or Band 4 (data not shown). This is similar to the mass of Peptide d1, which is 19,278 Da (Table II). Peptide d1 contains five Met residues, although based on the data with Arg-C (Fig. 6C), the N-terminal Met may have been removed on at least some of the polypeptides. Subtracting one Met residue, adding an acetyl group, and oxidizing four Met residues gives a mass of 19,253. Although this does not agree exactly with the m/z of the observed peak, it is within 0.6% of this value. As noted above, exact matching of predicted and observed peptide masses can be compromised by microheterogeneity (partial removal of N-terminal Met, acetylation, oxidation). Nonetheless, the peak at m/z = 19,371 is at least consistent with the presence of Peptide d1 in Band 1, which is, in turn, indicative of eIF4G-1f.

Band 2 Represents a Mixture of Proteins, Including eIF4G-1e-- Three lines of evidence suggest that Band 2 is a mixture of proteins. First, the band is broader than Band 1, Band 3, or other non-eIF4G bands in Coomassie Blue-stained gels (Fig. 4, band c band d), suggesting that proteins are unresolved. Second, Band 2 is, in fact, partially resolved into a doublet or triplet in some gels (data not shown). Third, the Auto MS-Fit program identified a LINE1 reverse transcriptase homologue and an eIF3 subunit in bands c and d (Fig. 4), despite the fact that they also contain numerous eIF4G-1 peptides (see below). Nonetheless, we were able to obtain structural information about the isoform of eIF4G-1 present in Band 2.

Of the tryptic peptides in cpN with masses >800 Da, we detected Peptides 3, 4, 5, 13, 30, and 38 in Band 2, either as single peptides or composite peptides resulting from missed cleavages. A set of peaks with m/z = 4630.0, 4646.3, and 4662.1 was observed for Band 2 (Fig. 6A, lower panel), corresponding to the two-Met signature of Peptide 5 (Table II). The presence of Peptide 5 is diagnostic of a protein being initiated with a Met upstream of eIF4G-1d, i.e. either eIF4G-1e or eIF4G-1f (Fig. 3). We also detected Peptide 13, which indicates the eIF4G-1 isoform present is initiated upstream of eIF4G-1a (data not shown). Yet Band 2 lacks Peptide 2 (Fig. 6A, lower panel), indicating it does not contain eIF4G-1f. It is clear from the results with Band 1 that Peptide 2 would have been detected if present (Fig. 6A, upper panel). The presence of Peptide 5 but absence of Peptide 2 indicates that Band 2 contains eIF4G-1e.

With Lys-C, we detected Peptides k3, k4, k5, k7, k8, k9, k10, k13, k14, k15, k16, k18, k19, k20, and k21 as either single or composite peptides (Fig. 3). Unfortunately, none of these peptides gave insight into the form of eIF4G-1 present, other than it was upstream of eIF4G-1a. Specifically, no evidence for the diagnostic peptide k2C1 was seen.

Digestion with Asp-N revealed a peak at m/z = 15,387 in Band 2 (Fig. 6D, lower panel). This m/z value is similar to that of Peptide d1C1 (15,300), which represents the N terminus of eIF4G-1e (Table II). Peptide d1C1 contains four Met residues. If each Met were oxidized, it would add 64 Da to the mass of this peptide, bringing the m/z value to 15,364 Da. The resulting difference between the observed and calculated values is 0.1%, which is considerably less than the difference for any other peptide predicted from the six hypothetical isoforms of eIF4G-1. This peak was not observed in Band 1 (Fig. 6D, upper panel), Band 3, or Band 4 (data not shown), providing additional evidence that eIF4G-1e is present only in Band 2.

Band 3 Contains eIF4G-1c-- The only tryptic peptides with masses >800 Da detected in Band 3 were Peptides 5C2, 12, 14, and 15. This is considerably fewer than for Band 1, presumably due to the lower amount of protein typically seen in this band (Fig. 2). A diagnostic peak was found in Band 3 at m/z = 2426.20 that was not present in Band 1 (Fig. 7A, lower panel). This is 42.03 Da more that the mass of Peptide 5C2 (Table II), which is the postulated N terminus of eIF4G-1c. This peak is therefore consistent with an Nalpha -acetylated form of Peptide 5C2, which would agree with our inability to obtain information for this band from Edman degradation (see above). A second peak was detected for Band 3 at m/z = 2442.23 (Fig. 7A, lower panel). This one-Met signature agrees with the presence of one Met residue in Peptide 5C2 (Table II) and suggests that, unlike eIF4G-1f, the N-terminal Met is not removed before acetylation of eIF4G-1c. Alternatively, the nascent polypeptide may have been eIF4G-1d from which Met-88 was removed, followed by Nalpha -acetylation of Met-89 (Fig. 1).


View larger version (21K):
[in this window]
[in a new window]
 
Fig. 7.   Identification of diagnostic peptides in Band 3. A, peaks for tryptic peptides from Bands 1 and 3. Arrows (lower panel) indicate Nalpha -acetylated Peptide 5C2 and its Met-oxidized derivative. A calibrant peak (ACTH clip 18-39, calculated mass 2465.20 Da) is present in both samples. B, peaks for Lys-C peptides from Bands 1 and 3. Arrows (lower panel) denote Met-oxidized derivatives of Nalpha -acetylated Peptide k2C3, which are unique to Band 3. C, peaks for Asp-N peptides from Band 3. The group of peaks match masses of Met-oxidized and unoxidized forms of Nalpha -acetylated Peptide d1C3.

Additional evidence for diagnostic peptides in Band 3 was sought using Lys-C. Peptides k2C3, k3, k4, k5, k7, k13, k14, k15, k16, k17, k18, k19, k20, and k21 were detected as either single or composite peptides. Diagnostic peptides were found at m/z = 9124.1 and 9139.5 (Fig. 7B, lower panel). These are within 1 Da of the predicted masses for the Nalpha -acetylated form of Peptide k2C3 containing either one (9123.3 Da) or two (9139.3 Da) oxidized Met residues. This two Met signature is consistent with the presence of two Met residues in Peptide k2C3 (Table II). The peak at m/z = 8883.3 of Band 1 corresponds to doubly charged Peptide k2 (predicted m/z = 8847.9 for the unoxidized peptide; see Fig. 6B). Its breadth is presumably due to microheterogeneity from multiple oxidized Met residues.

Finally, we analyzed the Asp-N peptides of Band 3 (Fig. 7C). A unique peptide at m/z = 10,317 matched the predicted mass for Nalpha -acetylated Peptide d1C3 (10,317 Da). Similarly, the observed peaks at m/z = 10,333 and 10,349 matched the masses of the singly and doubly oxidized forms of this acetylated peptide (predicted at 10,333 and 10,349 Da). These peptides provide additional evidence for the presence of eIF4G-1c in Band 3.

In a separate experiment, MS fragmentation data (Fig. 8) confirmed our assignment of the tryptic peptide detected at m/z = 2426.20 in Fig. 7A as Nalpha -acetylated Peptide 5C2 (cDNA-derived sequence given in Fig. 1).


View larger version (14K):
[in this window]
[in a new window]
 
Fig. 8.   LC-ESI/MS spectrum of Peptide 5C2 in Band 3. Fragmentation analysis of Nalpha -acetylated Peptide 5C2 shows ions of the b (N-terminal) and y (C-terminal) series. The y fragments (in the range y3+ to y20+) all match their predicted masses, whereas the detected b fragments give values that are shifted by +42 Da, indicating an acetylated Met residue at position 1.

Band 4 Represents a Mixture of Proteins That May Include eIF4G-1a-- The amounts of eIF4G-related material in Band 4 were insufficient to make a definitive assignment. However, peaks were detected that suggest the presence of eIF4G-1a. Specifically, we detected a peak in tryptic digests at 538.24 Da that is within 1 Da of Peptide 13C with one Met oxidized (537.25 Da). We also detected a peak in tryptic digests at 580.25 that is within 1 Da of the Nalpha -acetylated, Met-oxidized form of Peptide 13C (579.26 Da). In Asp-N digests, we detected a peak at m/z = 3574.1 that is within 1 Da of Peptide d3C (3575.0 Da; Table II). We also detected a peak at m/z = 3616.1 that is similar to Nalpha -acetylated Peptide d3C (3617.0 Da). All of these were only slightly above the background, making assignment of structure uncertain. Band 4 often occurs as a diffuse doublet, indicating heterogeneity, and immunoblotting demonstrates that the eIF4G-1 isoforms in Band 4 are much less abundant than in Bands 1-3 (Fig. 2).

    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The data presented here permit the identification of several eIF4G-1 primary structures and their assignment to electrophoretic bands. Band 1 consists of a novel isoform, here termed eIF4G-1f. This is also the most abundant form, as measured by Coomassie Blue staining and immunoreactivity to both anti-Peptide 7 and anti-Peptide 10 antibodies. Using trypsin or Lys-C, we observed Peptides 2 and k2, which can only arise from eIF4G-1f or a hypothetical protein initiated upstream of it. With Arg-C we detected the N-terminal peptide, Peptide r1, and its mass indicated that it was modified in two ways: the N-terminal Met was removed and the resulting Asn was Nalpha -acetylated. Asp-N produced Peptide d1, in agreement with the assignment of eIF4G-1f to Band 1, but it was too large (19,371 Da) to allow determination of the exact N-terminal structure. Thus, all of the data from Band 1 are consistent with its identity as eIF4G-1f. This isoform of eIF4G-1 was not predicted from any of the previously reported cDNA sequences. It is the longest isoform, with 22 additional aa residues at the N terminus compared with eIF4G-Iext (Fig. 1), the longest isoform previously postulated.

Several observations point to Band 2 being a mixture of proteins. Nonetheless, three types of data demonstrate that Band 2 contains eIF4G-1e. First, it reacts with anti-Peptide 10 antibodies, which are directed against an epitope in only eIF4G-1e and eIF4G-1f. Second, tryptic digests contain Peptide 5 but not Peptide 2, despite the fact that Peptide 2 is readily detectable in Band 1. Third, Asp-N digests contain Peptide d1C1 in Band 2 but not Band 1. Due to the size and microheterogeneity of Peptide d1C1, however, we were unable to determine the exact structure at the N terminus. eIF4G-1e is the isoform predicted from several of the cDNAs published before the present report (44-46), although its existence had not actually been demonstrated. Comparison of eIF4G-1e to eIF4G-1f by immunoreactivity with two different antibodies indicates that eIF4G-1e is less abundant (Fig. 2 and data not shown).

Both peptide matching and fragmentation data support the assignment of Band 3 as a Nalpha -acetylated form of eIF4G-1c. In digests with trypsin, Lys-C, and Asp-N, we detected the predicted N-terminal peptides and their Met-oxidation products, viz. Nalpha -acetylated Peptide 5C2, Nalpha -acetylated Peptide k2C3, and Nalpha -acetylated Peptide d4C3, respectively. These peptides were found in Band 3 but not other bands. Finally, the sequence of Nalpha -acetylated Peptide 5C2 was established by MS fragmentation. Based on the agreement between Coomassie Blue staining and immunoreactivity in comparison with Band 1, Band 3 appears to be homogeneous for eIF4G-1c. This form of eIF4G-1 is clearly present in K562 cells but is less abundant than eIF4G-1f and -1e (Fig. 2). eIF4G-1c is another novel isoform, not previously predicted from cDNA sequences.

Additional forms of eIF4G-1 may exist but were too low in abundance to be identified in the present study. Several peptides unique to eIF4G-1a were detected in Band 4, but we do not judge the evidence to be compelling. It is interesting to note that many bands besides Bands 1-4 react with anti-Peptide 7 antibodies (Fig. 2, lane 4). One might consider them to represent nonspecific binding of the antibodies to non-eIF4G-1 proteins except for one additional feature: they are not present in a preparation pretreated with 2A protease (Fig. 2, lane 3). The 2A protease of entero- and rhinoviruses is quite specific for a consensus aa sequence (38, 52, 56, 57). A few cellular proteins other than eIF4G are cleaved by 2A proteases (58-60), but two-dimensional electrophoresis reveals that the overwhelming majority of cellular proteins are unaffected by picornavirus infection (61). The likelihood that a non-eIF4G-1 protein would both react with anti-Peptide 7 antibodies and also be a substrate for 2A protease is quite remote. Thus, the numerous weak bands detected in anti-Peptide 7 immunoblots (Fig. 2, lane 4) may represent additional eIF4G-1 isoforms.

While there may well be additional isoforms of eIF4G-1, it is unlikely that there are any larger than eIF4G-1f. Even if longer cDNAs are found, three considerations make it unlikely that they will encode proteins containing an extension of the eIF4G-1f polypeptide sequence. First, we have shown that the structure of the major form (eIF4G-1f) is initiated from the AUG that is furthest upstream of the known cDNAs. Second, there are termination codons, both in-frame and out of frame, upstream of this AUG. It is possible that other forms of eIF4G-1 may be encoded by mRNAs arising by alternative splicing that contain different N-terminal aa sequences, e.g. AAC78442 (Fig. 1). However, we detected no peaks corresponding to either eIF4G-3 (4) or AAC78442. Third, eIF4G-1f is the principle constituent of Band 1, which is the slowest migrating immunoreactive band (Fig. 2). Drawing conclusions about eIF4G structure from electrophoretic mobility is complicated by the fact that the migration of eIF4G-1 on SDS-PAGE is aberrantly slow. The slowest eIF4G-1 isoform migrates at 220 kDa and the fastest, at 205 kDa (62), yet the calculated molecular masses range from 176 to 155 kDa (Table I). Thus, something besides the additional aa sequence reported here must account for the disparity. Despite the aberrant mobility, we observed that the order of bands follows the molecular mass of the isoforms, e.g. eIF4G-1f is both the largest form (Table I) and the slowest eIF4G band (Fig. 2), etc. Larger forms of eIF4G-1 may exist but not be purified by the method used in this study; it is worth noting that our method of affinity purification on m7GTP-Sepharose would exclude any hypothetical eIF4G-1 isoforms that lack an eIF4E-binding site such as the eIF4G homologue eIF4G-2 (3, 7, 8).

Although it is clear that the various isoforms of eIF4G-1 contain different N termini, the origin of these proteins cannot be determined from the results presented. Several formal possibilities can be considered. The first is alternative translation initiation of the same mRNA by leaky scanning. The optimal consensus sequence for initiation in animals is A/GXXAUGG (63). A "strong" initiation codon is considered to be one containing either the purine at -4, the G at +1, or both. The corresponding sequences for the various hypothetical eIF4G-1 isoforms are: eIF4G-1f, AAAAUGA; eIF4G-1e, CAAAUGA; eIF4G-1d, GUAAUGA; eIF4G-1c, AUGAUGA; eIF4G-1b, UUGAUGA; and eIF4G-1a, AUCAUGU. Thus, the strong initiation codons are in mRNAs for eIF4F-1f, -1d, -1c, and -1a, perhaps explaining the absence of eIF4G-1b. The second possibility for the multiplicity of eIF4G-1 isoforms is internal initiation of translation. Two separate sequences derived from the 5'-portion of eIF4G-1 mRNA can function as internal ribosome entry sites in cultured cells (45, 64, 65). A single mRNA corresponding to the longest cDNA reported to date (this report; 5510 nt) could be initiated from the 5'-end, giving rise to eIF4G-1f, and internally, giving rise to eIF4G-1c. The nucleotide sequence representing the 5'-untranslated region of an mRNA that would encode eIF4G-1d has already been shown to direct internal initiation of translation (45). This may provide a mechanism for eIF4G-1c expression, allowing for the N-terminal processing of the nascent polypeptide (removal of Met-88 and Nalpha -acetylation of Met-89). Finally there may be several mRNAs differing in their 5'-terminal sequences that encode different eIF4G-1 isoforms, some of which exclude upstream AUG codons. These could arise from alternative splicing (e.g. NP_004944 and AAC78442) or alternative promoter usage. Resolution of this question will require an investigation of structure of eIF4G-1 mRNA(s) present in the cell and their translational properties. The fact that we never observed peptides diagnostic for slower bands in faster bands argues against the faster bands representing proteolytic breakdown products, or polypeptides missing internal exons. The same conclusion can be drawn from the absence of any bands running faster than Bands 1 and 2 that bind anti-Peptide 10 antibodies (Fig. 2).

We found a surprisingly large number of ligands co-purifying with cpN (Fig. 4). This type of analysis does not indicate whether these ligands are bound to the cpN fragment of all eIF4G-1 isoforms or to only a subset. These ligands include PABP and hsp70, which is consistent with previous studies showing these to be eIF4G-binding proteins (10, 11, 26). Less expected is the presence of the p110 and p36 subunits of eIF3. It is known that 11-subunit eIF3 has a high affinity for eIF4G, but the binding site has been localized in cpC, not cpN (14, 19). The association of eIF4G with the p110 and p36 subunits may mean that there are additional points of attachment in cpN. The presence of a putative DEAD-box protein and an RNA helicase in cpN is intriguing, since eIF4A, a different member of the DEAD-box RNA helicase family, is well established to bind eIF4G-1 at two distinct sites in cpC (14, 20, 21).

At present we have not determined whether there are differences in the biochemical properties of the different eIF4G-1 isoforms. Since eIF4G is a protein that binds an unusually large number of protein and RNA ligands, the presence of different N-terminal sequences may direct the binding of isoform-specific ligands. The various eIF4G isoforms may have different affinities for ribosomes or initiation factors that link them to ribosomes (e.g. eIF3), opening the possibility that eIF4G-1 isoforms participate in mRNA recruitment to different extents. The only study explicitly comparing eIF4G-1 isoforms showed that transfection of cDNAs expressing eIF4G-1e and eIF4G-1a gave similar growth efficiencies in soft agar, an assay for malignant transformation (66). Yet no in vivo expression studies have been performed to date on eIF4G-1f, the longest and most abundant form in K562 cells.

    ACKNOWLEDGEMENTS

We gratefully acknowledge the Deutsches Ressourcenzentrum für Genomforschung GmbH and the Deutsches Krebsforschungs Zentrum, Germany, Heidelberg for cDNA clone DKFZp762O191 and the LSUHSC-S Research Core Facility.

    FOOTNOTES

* This work was supported by Grants GM20818 and RR00862 from the National Institutes of Health.The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

|| To whom correspondence should be addressed: Dept. of Biochemistry and Molecular Biology, Louisiana State University Health Sciences Center, 1501 Kings Highway, Shreveport, LA 71130-3932. Tel.: 318-675-5161; Fax: 318-675-5180; E-mail: rrhoad@lsuhsc.edu.

Published, JBC Papers in Press, January 30, 2002, DOI 10.1074/jbc.M111134200

2 Various numbering systems have been used in previous reports dealing with the human eIF4G-1 proteins. To allow comparison between sequences, we have performed an alignment of the proteins predicted from all eIF4G-1 cDNAs. The aa numbers used throughout the present work are based on this alignment, rather than the numbers used in the original publications, and are shown in Fig. 1.

3 The sequence deposited as GenBankTM nucleotide accession number AL120751 contains only 627 nt of the cDNA represented by clone DKFZp762O191. The current report confirms the entire EST sequence and extends the cDNA sequence. Alignment with other eIF4G-1 cDNAs predicts a 5510-nt mRNA corresponding to clone DKFZp762O191. The nucleotide numbers used in this report, unless otherwise noted, are based on this composite mRNA in which nt 1 is the first nt of AL120751.

4 Names of eIF4G-1 peptides used for designing oligonucleotide probes and making site-specific antibodies in earlier studies (2, 52) are not related to the names for eIF4G-1 peptides used in the present report (Fig. 3). To avoid confusion, the earlier peptide names are always preceded by the term "anti-" in the present work.

    ABBREVIATIONS

The abbreviations used are: eIF, eukaryotic initiation factor; aa, amino acid; EST, expressed sequence tag; MALDI-TOF-MS, matrix-assisted laser desorption ionization-time of flight mass spectrometry; MOWSE, molecular weight search; nt, nucleotide residues; PABP, poly(A)-binding protein; MOPS, 4-morpholinepropanesulfonic acid; LC-ESI, liquid chromatography-electrospray ionization; ACTH, adrenocorticotropic hormone.

    REFERENCES