Arginyltransferase, Its Specificity, Putative Substrates, Bidirectional Promoter, and Splicing-derived Isoforms*

Substrates of the N-end rule pathway include proteins with destabilizing N-terminal residues. Three of them, Asp, Glu, and (oxidized) Cys, function through their conjugation to Arg, one of destabilizing N-terminal residues that are recognized directly by the pathway's ubiquitin ligases. The conjugation of Arg is mediated by arginyltransferase, encoded by ATE1. Through its regulated degradation of specific proteins, the arginylation branch of the N-end rule pathway mediates, in particular, the cardiovascular development, the fidelity of chromosome segregation, and the control of signaling by nitric oxide. We show that mouse ATE1 specifies at least six mRNA isoforms, which are produced through alternative splicing, encode enzymatically active arginyltransferases, and are expressed at varying levels in mouse tissues. We also show that the ATE1 promoter is bidirectional, mediating the expression of both ATE1 and an oppositely oriented, previously uncharacterized gene. In addition, we identified GRP78 (glucose-regulated protein 78) and protein-disulfide isomerase as putative physiological substrates of arginyltransferase. Purified isoforms of arginyltransferase that contain the alternative first exons differentially arginylate these proteins in extract from ATE1-/- embryos, suggesting that specific isoforms may have distinct functions. Although the N-end rule pathway is apparently confined to the cytosol and the nucleus, and although GRP78 and protein-disulfide isomerase are located largely in the endoplasmic reticulum, recent evidence suggests that these proteins are also present in the cytosol and other compartments in vivo, where they may become N-end rule substrates.

An essential determinant of one class of degrons, called N-degrons, is a destabilizing N-terminal residue of a substrate. The set of destabilizing residues in a given cell type yields a rule, called the N-end rule, which relates the in vivo half-life of a protein to the identity of its N-terminal residue (1,(22)(23)(24)(25)(26). In eukaryotes, the N-degron consists of three determinants: a destabilizing N-terminal residue of a protein substrate, its internal Lys residue(s) (the site of formation of a poly-Ub chain), and a conformationally flexible region(s) in the vicinity of these determinants that is required for the substrate's ubiquitylation and/or degradation (8,23,(27)(28)(29).
The N-end rule has a hierarchic structure (Fig. 1A). In eukaryotes, N-terminal Asn and Gln are tertiary destabilizing residues (denoted as Nd t ) in that they function through their enzymatic deamidation (30,31), to yield the secondary destabilizing N-terminal residues Asp and Glu (denoted as Nd s ). The destabilizing activity of the N-terminal Asp and Glu requires their conjugation, by ATE1-encoded isoforms of Arg-tRNAprotein transferase (arginyltransferase or R-transferase), to Arg, one of the primary destabilizing residues (denoted as Nd p ) (26,(32)(33)(34)(35)(36)(37). The latter N-terminal residues are recognized by E3 Ub ligases of the N-end rule pathway, called N-recognins (23-25, 38, 39) (Fig. 1A). The N-end rule pathway of the yeast Saccharomyces cerevisiae is mediated by a single N-recognin, UBR1 (40,41), while at least four N-recognins, including UBR1, mediate this pathway in mammals (24,25,38,39,42). In mammals and other eukaryotes that produce nitric oxide (NO), the set of arginylated residues contains not only Asp and Glu but also N-terminal Cys (C) (43), which is arginylated after its oxi-dation (36). The in vivo oxidation of N-terminal Cys requires NO, as well as oxygen (O 2 ) or its derivatives (Fig. 1A) (26,37).
Although prokaryotes lack Ub conjugation and Ub itself, they contain (Ub-independent) N-end rule pathways (44 -47). The sets of destabilizing residues in prokaryotic N-end rules contain secondary destabilizing (Nd s ) residues as well. Their identities (Arg and Lys in Escherichia coli; Arg, Lys, Asp, and Glu in some other prokaryotes) are either overlapping with, or distinct from, the eukaryotic Nd s residues (47). The activity of Nd s residues in prokaryotes requires their conjugation to Leu (an Nd p residue) by Leu-tRNA-protein transferases (L-transferases), in contrast to the conjugation of Arg (an Nd p residue in eukaryotes) to N-terminal Asp, Glu, or (oxidized) Cys (Fig. 1A) (44,47). Prokaryotic L-transferases are of two distinct types, differing in both amino acid sequence and substrate specificity. Remarkably, L-transferases of one class, encoded by bpt genes, are sequelogs of ATE1-encoded eukaryotic R-transferases, despite the fact that Bpt (prokaryotic) aminoacyltransferases conjugate Leu, rather than Arg, to the N termini of cognate substrates (47). (In this terminology, "sequelog" and "spalog" denote, respectively, a sequence that is similar, to a specified extent, to another sequence and a three-dimensional structure that is similar, to a specified extent, to another three-dimensional structure (48). Sequelog and spalog are mnemonically helpful, single word terms whose rigor-conferring advantage is their evolutionary neutrality. The sequelog terminology conveys the fact of sequence similarity (sequelogy) without evolutionary or functional connotations, in contrast to interpretation-laden terms such as homolog, ortholog, and paralog. The latter terms are compatible with the sequelog/spalog terminology and can be employed to convey understanding about functions and common descent, if this information is actually available (48).) The functions of the N-end rule pathway include the regulation of import of short peptides (through the degradation, modulated by peptides, of the import's repressor) (41,49), the fidelity of chromosome segregation (through the degradation of a conditionally produced cohesin fragment) (50), the regulation of apoptosis (through the degradation of a caspase-processed inhibitor of apoptosis) (51,52), the regulation of meiosis (24), the leaf senescence in plants (53), as well as neurogenesis and cardiovascular development in mammals (26,36,37,39). Mutations in human UBR1, one of several functionally overlapping N-recognins ( Fig. 1A) (25,39), are the cause of Johanson-Blizzard syndrome, which includes mental retardation, physical malformations, and severe pancreatitis (54). The abnormalities of UBR1 Ϫ/Ϫ mice (38) include pancreatic insufficiency (54), a less severe version of the defect in human Johanson-Blizzard syndrome (UBR1 Ϫ/Ϫ ) patients.
The cardiovascular and (probably) other functions of the N-end rule pathway involve the arginylation-mediated degradation of RGS4, RGS5, and RGS16. These "GTPase-activating" proteins function by inhibiting the signaling by specific G proteins, and are themselves down-regulated through the NO/O 2dependent degradation by the N-end rule pathway. The N-terminal Cys residues of RGS4, RGS5, and RGS16 are oxidized in vivo at rates controlled by NO and oxygen, followed by the arginylation of oxidized Cys and processive proteolysis by the rest of the N-end rule pathway (Fig. 1A) (26,35,37). The arginylation branch of this pathway is also required for degradation of the in vivo-produced fragment of the mammalian RAD21/ SCC1 subunit of cohesin. 3 This fragment, a part of circuits that control mitosis and meiosis, is produced through a cleavage by separase (55) and bears N-terminal Glu (an Nd s residue) in mammals (56) but N-terminal Arg (an Nd p residue) in S. cerevisiae (50). Given the tripartite structure of N-degrons, it is possible that some physiological substrates of R-transferase would be found to lack a complete N-degron, i.e. that their arginylation would not be followed by their degradation.
All eukaryotes examined, from fungi to plants and animals,  The yellow ovals denote the rest of a protein substrate. MetAPs, methionine aminopeptidases. C* denotes oxidized N-terminal Cys, either Cys-sulfinic acid (CysO 2 (H)) or Cys-sulfonic acid (CysO 3 (H)), produced in reactions mediated by nitric oxide (NO) and oxygen (O 2 ) or its derivatives, with subsequent arginylation of oxidized Cys by the ATE1-encoded isoforms of Arg-tRNA-protein transferase (R-transferase) (26). (NO is produced from Arg by NO synthases.) The type 1 and type 2 primary destabilizing N-terminal residues are recognized by multiple E3 ubiquitin ligases (N-recognins) of the N-end rule pathway, including UBR1 and UBR2. Through their other substrate-binding sites, these E3s also recognize internal (non-N-terminal) degrons in other substrates of the N-end rule pathway, denoted by a larger oval. The marks of omission after "calpains" refer to the expectation that physiologically relevant N-degrons can also be produced by intracellular proteases additional to those cited. B, MetAPs remove Met from the N terminus of a polypeptide if the residue at position 2 belongs to the set of residues shown (23,87).
contain both R-transferases and the N-end rule pathway. The former are sequelogous (similar in sequence) (48) throughout most of their ϳ60-kDa spans, even between highly divergent eukaryotes such as fungi and mammals (34). A single gene, ATE1, encodes R-transferase in both S. cerevisiae (32) and the mouse or human genomes (34,36), whereas plants such as Arabidopsis thaliana contain two genes, ATE1 and ATE2, which encode sequelogous R-transferases (53). Our previous studies described the cloning of yeast and mouse ATE1, and also the finding that mammalian (but not S. cerevisiae) R-transferase occurs as two isoforms, produced through alternative splicing of ATE1 pre-mRNA (32,34). The two mouse R-transferases were shown to be of identical sizes (516 residues), differing by a stretch of 43 residues, encoded by two alternative, adjacent and sequelogous 129-bp exons (34).
In this work, we continued the analysis of mouse ATE1, identifying six splicing-derived isoforms of ATE1 mRNA and also discovering that the ATE1 transcriptional promoter is bidirectional, driving the expression of both ATE1 and an oppositely oriented, previously uncharacterized gene. In addition, we identified GRP78 (BiP) and protein-disulfide isomerase (PDI), two proteins located primarily in the endoplasmic reticulum (ER), as putative physiological substrates of R-transferases. Several lines of evidence (57)(58)(59)(60) suggest that both GRP78 and PDI can be present in non-ER compartments as well, including the cytosol and the nucleus, where these proteins may be targeted by the N-end rule pathway.
A recent paper by Rai and Kashina (61) described two new splicing-derived mouse ATE1 isoforms, in addition to the two previously known ones (34). The authors claimed that the four R-transferases could arginylate unmodified N-terminal Cys and also that the two new R-transferases were inactive with N-terminal Asp or Glu (61). We show that both of these conclusions are incorrect: the activity of known R-transferases toward unmodified N-terminal Cys is negligible, and the two isoforms of R-transferase stated to be inactive with the N-terminal Asp and Glu (61) are actually active with these residues.
Two-dimensional Electrophoresis and Mass Spectrometric Identification of Proteins-Two-dimensional electrophoretic analyses of our samples were carried out by Kendrick Laboratories, Inc. (Madison, WI). Isoelectric focusing was performed using 13-cm strips (pH range 4 -10) (Amersham Biosciences). Seconddimension fractionation was by SDS-10% PAGE, using 20-cm-long gels. The latter were stained with Coomassie Blue, treated with Enhancer (PerkinElmer Life Sciences), and vacuum-dried, followed by fluorography with Kodak BioMax film for ϳ2 weeks at Ϫ80°C. Image matching of fluorographs to Coomassie-stained gels was carried out manually. The relevant spots were excised from the gel, followed by processing for in-gel digestion with trypsin and mass spectrometry (MS/MS), which was performed at the Protein Analysis Facility of the Columbia University (New York).
Northern Hybridization-Multiple tissue Northern blots containing 2 g of mouse poly(A) ϩ RNA per lane (Clontech) were probed with a 701-bp DNA fragment that was produced by PCR amplification from EST-AW414102 (GenBank TM ), using the primers 5Ј-ACTTTACAGTTGCTAGATAAGC-3Ј and 5Ј-GTCCAATGACGAAGCGACAC-3Ј. The amplified fragment, corresponding to genomic DNA between Ϫ1603 and Ϫ903 relative to the start of ATE1 exon 1B, was 32 P-labeled using the RediprimeII random prime labeling system (Amersham Biosciences) according to the manufacturer's protocol. Mus musc., mouse; Rattus norv., rat; Homo sap., human; Gallus gall., chicken. D, bidirectional promoter upstream of the ATE1 exon 1B. The indicated sequence elements at the 5Ј-end of the mouse ATE1 gene include the (alternative) exon 1A (87 bp; question mark denotes the currently unknown location of promoter element(s) for transcripts that include exon 1A); the (alternative) exon 1B (108 bp); an ϳ200-bp, high CpG DNA segment immediately upstream of exon 1B that functions as a bidirectional promoter (see the main text); and exon 2 (63 bp) of the multiexon ATE1. The stated sizes (in bp) of exons 1A and 1B refer to the protein-coding lengths; their actual lengths, which include their 5Ј-untranslated regions remain to be determined. Green arrows indicate transcriptional units, including a previously uncharacterized gene, oriented head-to-head relative to ATE1 and transcribed, at least in part, from the bidirectional promoter 6 (see the main text).
Hybridization was carried out for 12 h at 68°C in ExpressHyb solution (Clontech). The blot was then washed twice for 15 min at room temperature in 2ϫ SSC, 0.1% SDS, once for 15 min at room temperature in 0.1ϫ SSC, 0.1% SDS, and once for 30 min at 50°C in 0.1ϫ SSC, 0.1% SDS, followed by autoradiography.
Luciferase Gene Fusions and Analysis of the ATE1 Promoter-pGL3-empty was constructed by removing the P SV40 promoter from the pGL3 promoter vector (Promega, Madison, WI), by digesting it with KpnI and HindIII, blunt-ending the resulting larger fragment with T4 DNA polymerase, and circularizing it using T4 DNA ligase. The pGL3-derived plasmids pCB30, pCB40, pCB41, pCB42, pCB43, pCB44, pCB107, and pCB108 were constructed as follows. Various 5Ј-untranslated regions of the mouse ATE1 gene were amplified by PCR using the primer sets indicated below. Upon amplification, a 5Ј-KpnI site and a 3Ј-HindIII site were incorporated into the PCR-produced fragments. These (modified) genomic DNA fragments were used to replace the entire P SV40 promoter in pGL3 promoter vector, by digestion with KpnI and HindIII and ligation with T4 DNA ligase. The pGL3-derived plasmids pCB59F and pCB59R were constructed by PCR amplification of mouse genomic DNA (regions 1 and 2; see "Results and Discussion"), from 223 to 32 bp upstream of the ATE1 exon 1B, using primers CB34 and CB33 and producing a HindIII site at both the 5Ј-and 3Ј-ends of the resulting fragment. It was then digested with HindIII and ligated to KpnI/HindIII-cut pGL3 promoter vector. This step yielded a mixture of nicked plasmids containing the above fragment (in either orientation) directly upstream of the firefly luciferase gene. The sample was then treated with T4 DNA polymerase and ligated using T4 DNA ligase. The above Hin-dIII-flanked PCR fragment was also subcloned into pCB30, followed by digestion with HindIII, producing pCB60F and pCB60R. To minimize variability caused by differences in transfection efficiency, we cotransfected pGL3-derived plasmids, expressing the firefly luciferase gene, along with the pRL-SV40 plasmid (Promega), expressing the Renilla luciferase gene. The firefly and Renilla luciferases have dissimilar substrate requirements, making it possible, with the dual-luciferase reporter assay system (Promega), to measure their luminescence separately for each enzyme.
Splicing-derived Isoforms of Mouse ATE1-Work by this laboratory has shown that prokaryotes, in addition to containing L-transferases (encoded by aat genes) (44,45) that are nonsequelogous to the ATE1-encoded eukaryotic R-transferases, also contain a distinct family of bpt-encoded L-transferases that are sequelogs (48) of eukaryotic ATE1 R-transferases (see Introduction) (47). One of the most highly conserved sequences in ATE1 R-transferases is CGYC. It is present near N termini of these enzymes (Fig. 2C) and is important for their activity. 4 Although all examined R-transferases (they are confined, so far without exception, to eukaryotes) contain CGYC, a more general motif, CXYX (with restrictions on the identities of variable residues), is conserved in the eukaryotic/prokaryotic ATE1/Bpt superfamily of aminoacyltransferases. A search in data bases for mammalian sequelogs of CGYC-containing sequences identified an expressed sequence tag (EST) that was later found to correspond to the alternative first exon of ATE1, termed 1A (Fig. 2, A-C).
To determine whether in vivo transcripts containing other parts of ATE1-coding sequence contained exon 1A as well, RT-PCR was carried out using primers derived from both exon 1A and the sequences encoding the C termini of the alternative exons 7A or 7B of the previously characterized (34) isoforms ATE1 1B7A and ATE1 1B7B . The results, using mouse brain mRNA ( Fig. 3 and supplemental Fig. S1), showed that exon 1A, similarly to its previously characterized sequelog exon 1B, was present in transcripts of the expected length containing exon 7A or exon 7B, two previously characterized (34) alternative (and sequelogous) exons of equal sizes. We concluded that the in vivo synthesis and processing of ATE1 pre-mRNA produces at least four distinct mRNAs, ATE1 1A7A , ATE1 1A7B ; ATE1 1B7A , and ATE1 1B7B . Among the 10 sequenced ATE1 cDNA isolates produced by RT-PCR (using exon 1A-specific forward primer and exon 12-specific complementary reverse primer), 9 isolates corresponded to ATE1 1A7A mRNA and only one corresponded to ATE1 1A7B mRNA, suggesting lower levels of the latter species in brain mRNA (see below).
RT-PCR was also employed to characterize the (approximate) relative levels of mouse ATE1 mRNA isoforms. The previously identified (34) ATE1 1B7A mRNA was found to be a major species in every examined tissue except for spleen and skeletal muscle, whereas ATE1 1B7B , the other previously known isoform (34), was expressed at comparable or slightly lower levels in the brain, liver, and testis, was a minor species in the spleen, muscle, and heart, and was virtually absent from the kidney, in contrast to ATE1 1B7A (Fig. 3). The third isoform, ATE1 1A7A , was the major species in the muscle and was relatively high in the testis, kidney, heart, and spleen but was present at lower levels in the brain, and at still lower levels in the liver (Fig. 3). ATE1 1A7B , the fourth isoform, was expressed at readily detectable levels only in the kidney, testis, and muscle (Fig. 3), in agreement with low frequency of full-length ATE1 1A7B isolates produced by RT-PCR from brain mRNA.
Two additional species, of slightly larger than expected sizes, were amplified by RT-PCR from the muscle-and testis-derived samples specific for ATE1 1A7B , and from the testis-derived samples specific for ATE1 1B7B (Figs. 3 and supplemental Fig.  S1). DNA sequencing revealed that these species corresponded, respectively, to ATE1 1A7AB and ATE1 1B7AB mRNAs, encoding slightly longer ATE1 isoforms that differed by the first alternative exons, 1A and 1B, and contained both of the alternative internal exons, 7A and 7B (Fig. 2, A and B). The removal of the (shorter) intron between the exons 7A and 7B, instead of a (longer) intron that in addition contained either 7A or 7B ( Fig.  2A), lengthened but did not frameshift the ATE1 open reading frame. The levels of ATE1 1A7AB in the muscle and testis were approximately equal to those of the (shorter) ATE1 1A7B isoform, whereas ATE1 1B7AB was significantly less abundant than ATE1 1B7B in the testis, the only tissue where ATE1 1B7AB was detectable (Fig. 3). The finding of ATE1 1A7AB and ATE1 1B7AB increases the total number of known ATE1 mRNA isoforms to six (Fig. 2B). Whether all or only some of the mouse ATE1 isoforms (identified as mRNAs and characterized as recombinant proteins expressed in E. coli or S. cerevisiae) exist as naturally translated proteins in the mouse remains to be determined, because the current polyclonal antibody to mouse ATE1 was raised against one isoform, the full-length ATE1 1B7A (26), and would be expected to recognize all six isoforms (Fig. 2B). The larger isoforms, ATE1 1A7AB and ATE1 1B7AB , would be expected to differ from the other four (identically or similarly sized) ATE1 isoforms by at most ϳ5 kDa, and therefore size differences alone would not suffice for identification of the isoforms in mouse cells. Isoform-specific antibodies would be required to address these issues definitively. Work to produce such antibodies is under way. Our previous study (34) had shown that, in contrast to highly similar ATE1 exon/intron patterns in the mouse and human genomes, including the presence of alternative exons 7A and 7B, both the Drosophila melanogaster (arthropod) and Arabidopsis thaliana (plant) ATE1 genes lacked the alternative internal exon 7A/7B arrangement. Drosophila contains just one but still separate type 7 exon, whereas Arabidopsis contains a type 7 exon as a part of a larger exon, the result of a fusion of several exons that are separate in vertebrates. Examination of the sequenced Drosophila and Arabidopsis genomes confirmed the earlier conclusions and indicated that these nonvertebrate eukaryotes also lacked the mammalian ATE1 pattern of alternative first exons 1A/1B. Drosophila and Arabidopsis contain only one such exon, and pufferfish (Takifugu rubripes), a vertebrate, apparently has a single first exon too (data not shown), in contrast to mammals and birds (Fig. 2, A-C).
To characterize ATE1 isoforms as enzymes and components of the N-end rule pathway, we asked whether each of them could confer metabolic instability, in ate1⌬ S. cerevisiae (lacking its own R-transferase), on Asp-␤gal, produced by the Ub fusion technique (65) from Ub-Asp-␤gal and bearing N-terminal Asp, a secondary destabilizing residue ( Fig. 1A and Introduction). Previous work (28,34,40,63,66) has shown that in yeast the steady-state level of an X-␤gal reporter is a sensitive measure of its metabolic stability. SGY3 S. cerevisiae (ate1⌬) was cotransformed with a low copy plasmid expressing an isoform of mouse ATE1 from the P GALS promoter and a high copy plasmid expressing Asp-␤gal (Ub-Asp-␤gal) from the P GAL1 promoter. Controls included either vector alone or an otherwise identical plasmid expressing S. cerevisiae ATE1. The levels of X-␤gal proteins were determined by measuring the enzymatic activity of ␤gal in cell extracts. Using this assay (28,34,40,63,66), we found that, as expected, the previously characterized isoforms ATE1 1B7A and ATE1 1B7B (34) conferred metabolic instability on Asp-␤gal, with ATE1 1B7B being less active than ATE1 1B7A (Fig. 4B), in agreement with both the earlier evidence (34) and a separate and direct arginylation test (see below). Furthermore, ATE1 1A7AB and ATE1 1B7AB , the fifth and sixth isoforms (Fig. 2B) that contained both exons 7A and 7B, were much less active than the other four isoforms (Fig. 4B). Control immunoblotting tests (data not shown), using antibody to mouse ATE1 (26), confirmed that at least the ATE1 1A7A , ATE1 1B7A , and ATE1 1B7B isoforms (Fig. 2B) were expressed at similar levels (less than 2-fold differences) in transfected ate1⌬ ubr1⌬ S. cerevisiae that yielded the results in Fig. 4B. Finally, ATE1 1B⌬(7AB) and ATE1 1B⌬(7AB) , two artificial (engineered) deletion derivatives of the active R-transferases ATE1 1B7A and ATE1 1B7B that lacked both of the internal exons 7A and 7B, yielded the levels of Asp-␤gal that were indistinguishable from those in ate1⌬ yeast that lacked arginylation, strongly suggesting an essential requirement for the (alternative) internal exons 7A or 7B (Fig. 4B).
In a different approach to some of these questions, we employed an enzymatically direct in vitro test (as distinguished from the degradation-coupled in vivo assay (Fig. 4B)). The test used [ 3 H]Arg, purified ATE1 isoforms, and ␣-lactalbumin as a reporter bearing N-terminal Glu, one of three secondary destabilizing residues (Fig. 1A). In this previously described assay (26,36,47), identical amounts of purified mouse ATE1 1A7A , ATE1 1B7A , and ATE1 1B7B were incubated with ␣-lactalbumin, [ 3 H]Arg, and an extract from ATE1 Ϫ/Ϫ mouse EF cells (which lacked R-transferases), followed by detection of the arginylated reporter by SDS-PAGE and fluorography. ATE1 1A7A and ATE1 1B7A were highly active as R-transferases in this assay (Fig.  4A). In contrast, the activity of ATE1 1B7B was negligible at the same level of the assay's sensitivity (Fig. 4A) but was detectable (albeit weakly) in a different [ 3 H]Arg-based arginylation assay, with ATE1 Ϫ/Ϫ embryo extracts (Fig. 6A), in agreement with the arginylation assay (see "Experimental Procedures") but without added R-transferase, followed by SDS-PAGE and fluorography. Lane 2, same but with ATE1 ϩ/ϩ extract. Lanes 3-5, same but with the added (purified) ATE1 1B7A , ATE1 1B7B , and ATE1 1A7A , respectively, at 1 g each. Lane 6, same as lane 2 but with ␣-lactalbumin (bearing N-terminal Glu). Lane 7, same but with ATE1 Ϫ/Ϫ extract. Lanes 8 -10, same as lanes 3-5, but with ␣-lactalbumin (the electrophoretic position of its arginylated derivative is indicated by asterisk). Note the absence of activity of ATE1 1B7B at the level of sensitivity of this assay (see the main text). B, the six isoforms of R-transferase can metabolically destabilize, with different efficiencies, an otherwise long lived Asp-␤gal reporter protein in ate1⌬ S. cerevisiae (see "Experimental Procedures"). Values are the means Ϯ S.D., from three independent measurements. The specific isoforms, as well as two deletion derivatives of ATE1, are indicated below the diagram. C, determination, through Edman degradation, of the N-terminal sequence of isolated Asp-␤gal reporter that had been expressed in ate1⌬ ubr1⌬ S. cerevisiae either without or together with ATE1 1A7A .
weaker but detectable activity of ATE1 1B7B in the yeast-based reporter-degradation assay (Fig. 4B).
The alternative splicing of mouse ATE1 pre-mRNA(s) involves two pairs of the alternative and sequelogous (48) exons, 1A/1B and 7A/7B (34) (Fig. 2C). Interestingly, the alternative exons 7A and 7B ( Fig. 2A) (34) contain 5Ј-and 3Ј-splice junction consensus sequences that are characteristic of introns rather than exons (data not shown), suggesting the origins of these alternative exons, over evolutionary time, from introns, through a set of processes called "exonization" (67). Are the two pairs of alternative exons in the ATE1 genes of mammals and birds (Fig. 2C) a relatively recent feature, acquired during the evolution of vertebrates, or does the alternative-exon organization of ATE1 predate the divergence of animals and plants, having been lost in most lineages but retained in some (not all) extant vertebrate species? One difficulty in addressing this issue definitively is that positive and negative selection pressures are not the only evolutionary forces at work in such settings, where random drift, through almost neutral mutations, contributes significantly as well, especially in organismal lineages with relatively low population sizes such as vertebrates (68). The evolutionary understanding of ATE1 would be advanced by the discovery of specific functions of R-transferase isoforms (Fig. 2B).
While our findings about the isoforms of mouse ATE1 were being prepared for publication, Rai and Kashina (61) published a paper that described the identification of two of these isoforms, which they termed ATE1-3 and ATE1-4. These isoforms are identical to ATE1 1A7A and ATE1 1A7B , respectively, of the present work (Fig. 2B). Two additional new isoforms described in the present work, ATE1 1A7AB and ATE1 1B7AB (Figs. 2B, 3, and 4), bring the current number of mouse R-transferase isoforms to six. In addition to describing two ATE1 isoforms, Rai and Kashina (61) presented evidence for two specific properties of ATE1 R-transferases. First, they concluded that the two new R-transferase isoforms (ATE1 1A7A and ATE1 1A7B in the current terminology; see Fig. 2B) could not arginylate substrates bearing N-terminal Asp or Glu (61), in contrast to the previously characterized isoforms ATE1 1B7A and ATE1 1B7B (34). Second, they concluded, on the basis of circumstantial evidence (cycloheximide-based pulse-chases with mouse ATE1 Ϫ/Ϫ EF cells and the RGS4 protein reporter), that all four R-transferase isoforms could arginylate unmodified N-terminal Cys (61).
Our data do not support the above conclusions. Specifically, both ATE1 1A7A and ATE1 1A7B (which were described by Rai and Kashina (61) as inactive with N-terminal Asp and Glu) are actually active with these residues, as demonstrated by three different tests (Fig. 4, A-C). At least one of these tests, the yeastbased X-␤gal activity assay (Fig. 4B), is identical to the assay used by Rai and Kashina (61). As mentioned above, the control immunoblotting tests (data not shown), using antibody to mouse ATE1 (26), confirmed that ATE1 1A7A , ATE1 1B7A , and ATE1 1B7B (Fig. 2B) were expressed at similar levels (less than 2-fold differences) in the transfected ate1⌬ ubr1⌬ S. cerevisiae. We further verified these results by employing a second assay, also used by the above authors (61), namely the N-terminal sequencing of X-␤gal reporters. Asp-␤gal was expressed in ate1⌬ ubr1⌬ S. cerevisiae (deletion of UBR1 precluded degradation of arginylated Asp-␤gal) in the presence of (coexpressed) ATE1 1A7A . The resulting ␤gal reporter was isolated, purified, and sequenced by Edman degradation, as described previously (26,34). More than 90% of Asp-␤gal isolated from ate1⌬ ubr1⌬ S. cerevisiae that coexpressed ATE1 1A7A was found to be the arginylated Arg-Asp-␤gal (Fig. 4C), thus directly confirming the conclusion from yeast-based X-␤gal activity assays (Fig. 4B).
We do not know the reason for this direct discrepancy between our experimental data (Fig. 4) and the data by Rai and Kashina (61). Because no controls verifying that R-transferases were actually expressed (as proteins) in the tester yeast strain were mentioned in the cited study, one possibility (which, if correct, would account for the above discrepancy) is that the apparent lack of activity reported by the authors (61) stemmed from negligible expression levels of the tested mouse ATE1s in S. cerevisiae.
Rai and Kashina (61) also observed that mouse RGS4, an N-end rule substrate that bears N-terminal Cys and is targeted for degradation by the arginylation branch of the N-end rule pathway (26,35,37) (see Introduction), was short lived in arginylation lacking ATE1 Ϫ/Ϫ mouse EF cells that had been transiently transfected with cDNAs encoding any one of the four tested mouse ATE1s. From this (inherently indirect) evidence, they concluded that mouse R-transferases, in contrast to S. cerevisiae R-transferase, could mediate the arginylation of unmodified N-terminal Cys (61). However, our earlier study has shown that RGS4 isolated from mouse cells that had been treated with proteasome inhibitor (to prevent the destruction of arginylated RGS4) was not only arginylated but also contained a modified Cys residue at position 2, specifically Cys sulfonate (36). This result was followed by our recent study that demonstrated, using direct enzymatic tests with purified R-transferases, that the activity of either mouse or S. cerevisiae ATE1s toward unmodified N-terminal Cys was negligible and that NO-mediated oxidation of N-terminal Cys was required for its (subsequent) arginylation, both in vitro and in vivo (26).
The ATE1 Promoter-Analyses, using PromoterInspector (Genomatix) of the ϳ20-kb segment upstream of mouse ATE1, pinpointed an ϳ700-bp region immediately upstream of exon 1B that appeared to comprise a part of ATE1 promoter (P ATE1 ). To identify putative cis-acting sequences of P ATE1 , we tested fragments of genomic DNA that encompassed the above region, using a double-luciferase reporter system and transient cotransfections of NIH-3T3 cells. The measured levels of the firefly luciferase reporter, whose expression was driven by various DNA regions upstream of exon 1B (Fig. 2D), were compared with the levels of firefly luciferase driven by the control (P SV40 ) promoter (Fig. 5, B, D and E). Whereas constructs containing the segment from Ϫ1,603 to Ϫ543 bp (pCB30; region 3) or the segment from Ϫ2,083 to Ϫ543 bp (pCB40; region 3-4) upstream of exon 1B were unable to drive luciferase expression, the constructs that contained the segment Ϫ2,083 to Ϫ223 bp (pCB41; region 2-4) or the segment from Ϫ1,603 to Ϫ223 bp (pCB43; region 2-3) yielded significant but modest levels of luciferase expression, ϳ50 and 30% of the control (P SV40 -conferred) levels, respectively (Fig. 5, B, D, and E). In contrast, the segment from Ϫ2,083 to Ϫ32 bp (pCB42; region 1-4) produced ϳ3.5-fold higher levels of luciferase than the control (P SV40 ) promoter, with a further increase to ϳ4.5-fold for regions 1-3, from Ϫ1,603 to Ϫ32 bp (pCB44) (Fig. 5, B, D, and E). The latter result suggested that the region from Ϫ2,083 to Ϫ1,603 bp (region 4) contained motifs that repressed transcription. In summary, the addition of a 192-bp fragment, located between Ϫ223 and Ϫ32 bp (region 1), to the Ϫ2,083/Ϫ223-bp segment (region 2-4) increased luciferase expression by ϳ6-fold (compare pCB42 and pCB41) and by ϳ14-fold in the absence of the repressive region 4 (compare pCB44 and pCB43) (Fig. 5, B, D,  and E).
The 192-bp region-1 segment could drive luciferase expression by itself, at the level comparable with that of the control (P SV40 ) promoter. This region, directly upstream of exon 1B, sufficed as a promoter under these conditions (compare pCB59F to pSV40; Fig. 5, B and E). Similarly to the sequences of ATE1 exons, the sequence of this promoter region is highly conserved among the human, mouse, rat, and other examined vertebrate genomes, whereas DNA sequences outside this region are much less conserved, as determined using percent identity plot. Interestingly, this promoter motif is located between the alternative exons 1A and 1B, indicating that additional promoter elements must exist upstream of (alternative) exon 1A that drive the expression of ATE1 mRNAs containing exon 1A (Fig. 2D and Fig. 5, A and B).
Promoter-relevant regions predicted by PromoterInspector encompassed, in addition to the region 1, about 500 bp of the adjacent upstream sequence as well. The addition of the region 2 (Ϫ543/Ϫ223 bp) yielded an ϳ4-fold increase in luciferase levels over that of the region 1 alone (compare pCB59F to pCB107) (Fig. 5, B, D and E). Furthermore, although the much larger region 1-3 (Ϫ1603/Ϫ32 bp) (pCB44) conferred luciferase levels ϳ4.5-fold higher than the control (P SV40 ) promoter, the otherwise identical segment that lacked the region from Ϫ543 to Ϫ223 bp (pCB60F; region 1 and 3) yielded luciferase FIGURE 5. Detection and analysis of a bidirectional promoter element in the mouse ATE1 gene. A, diagram of (G ϩ C) content of genomic DNA (ϳ30 kb) encompassing the 5Ј-region of ATE1 reveals a CpG island between the alternative exons 1A and 1B. Positions of the first nine ATE1 exons (1A through 7B) are shown below the (G ϩ C) pattern. The first nucleotide of ATE1 exon 1B is denoted as ϩ1. B, the 192-bp segment immediately upstream of ATE1 exon 1B can function as bidirectional promoter. The diagram shows segments of genomic DNA upstream of ATE1 exon 1B that were tested for their ability to direct the expression of luciferase gene in transfected mouse cells. C, Northern analysis of expression of a previously uncharacterized mouse gene adjacent to ATE1 and transcribed in the opposite direction 6 (see Fig. 2D and "Experimental Procedures"). The arrow indicates the main RNA species, with strongest expression in the testis. Actin mRNA probe was used to verify the uniformity of total RNA inputs. D and E, expression of luciferase, measured using a dual luciferase assay with extracts from NIH-3T3 cells transfected with pGL3-based plasmids that contained upstream DNA segments indicated in B. Luciferase levels are plotted in relative light units, normalized against the sample with P SV40 plasmid, arbitrarily assigned 1 light unit. Values are the means Ϯ S.D., from three independent measurements. levels that were only ϳ1.6 times higher than those by P SV40 or by the 192-bp fragment alone (Fig. 5, B, D, and E). We concluded that positive promoter elements of P ATE1 encompassed both the critical 192-bp region 1 (Ϫ223/Ϫ32 bp) and the adjacent upstream region 2 (Ϫ543/Ϫ223 bp) (Fig. 2D and Fig. 5, A  and B).
The observed spatiotemporal complexity of ATE1 expression in both embryonic (36) and adult tissues, 5 as well as the presence of (alternative) exon 1A upstream of the currently defined promoter region, imply that the cis-acting elements identified so far (Fig. 2D and Fig. 5B) are but a part of the multisite, modular P ATE1 promoter, whose complete span remains to be determined. Our attempts, based on the dual-luciferase assay, to identify transcription-enhancing promoter elements upstream of exon 1A were unsuccessful thus far (pCB30 and pCB40; Fig. 5, B, D, and E). In addition, analyses of the ϳ10-kb region upstream of exon 1A by PromoterInspector did not yield candidate sequences either significantly or immediately upstream of exon 1A. Further work to identify such sequences is under way.
Bidirectionality and CpG Island of the ATE1 Promoter-The mean G ϩ C content of ϳ30 kb of the mouse genomic DNA, from ϳ10 kb upstream to ϳ20 kb downstream of ATE1 exon 1B was ϳ40%. In contrast, an ϳ800-bp region containing both exons 1A and 1B, from ϳ680 bp upstream of exon 1B to ϳ120 bp downstream of exon 1B, had the mean G ϩ C content of 75% ( Fig. 5A and supplemental Fig. S2). Closer inspection identified 85 CpG dinucleotide repeats in this region (Figs. 5A and supplemental Fig. S2). About half of these CpGs resided in a segment directly upstream of the ATE1 exon 1B that includes the highly conserved 192-bp region 1, shown to function as the core promoter element between the alternative exons 1A and 1B ( Fig. 2D and Fig. 5A). High density of CpG repeats is a characteristic feature of previously identified bidirectional promoters in metazoan genomes (69 -77). Although the number of actually characterized bidirectional promoters remains low, current estimates suggest that more than 10% of genes in a mammalian genome may be present as closely apposed head-to-head pairs transcribed from bidirectional promoters (75)(76)(77).
The 192-bp region-1 DNA segment was indeed found to drive the expression of a luciferase reporter in both orientations (Fig. 5, B, D, and E). Moreover, linking the region 3 (Ϫ1,603/ Ϫ543 bp) to the oppositely oriented region 1 enhanced the expression of luciferase, similarly to the effect of region 3 (Ϫ1,603/Ϫ543 bp) on the naturally oriented region 1 segment (compare pCB59R to pCB60R and pCB59F to pCB60F; Fig. 5, B and E). We also made constructs, pCB107 and pCB108, that contained the entire region 1-2 (Ϫ543/Ϫ32 bp) in either orientation (Fig. 5B). Whereas pCB107 conferred high levels of luciferase expression, the same fragment in the opposite orientation was essentially inactive, indicating that the property of bidirectionality is confined mostly to the smaller (192 bp) CpGrich region 1 promoter element embedded in a larger CpG island (Figs. 2D, Fig. 5, A, B, and D, and supplemental Fig. S2). Thus, the region 1, an evolutionarily conserved, CpG-rich genomic DNA fragment at the Ϫ223/Ϫ32 position relative to the start codon of the mouse ATE1 exon 1B can function as a bidirectional promoter in transient transfection assays with luciferase reporter.
To determine whether the CpG-rich region-1 bidirectional element functioned similarly in the context of genomic ATE1 locus, we carried out Northern hybridization of poly(A) ϩ RNA from adult mouse tissues, using a probe derived from genomic DNA upstream of the bidirectional promoter region. To detect transcripts that were distinct from those encoding ATE1 isoforms (e.g. those containing exon 1A), the probe was designed so that it did not overlap with either ATE1 exon 1A or its (expected) 5Ј-untranslated region, the latter suggested by information in the GenBank TM EST entries AW105867 and CJ144976 (data not shown). A low level ϳ2-kb transcript was detected in the heart, brain, spleen, lung, liver, and kidney but not in skeletal muscle (Fig. 5C). Although we still do not know the location of promoter elements that yield ATE1 transcripts containing exon 1A (see above), the ongoing analyses 6 of a new gene that is transcribed in the direction opposite to that of ATE1 made the assignment of transcripts described below definitive (data not shown). In particular (to cite just the evidence relevant to the specificity of detection of oppositely oriented transcripts by Northern), the exon 1A-containing ATE1 transcripts were clearly expressed in skeletal muscle, as detected by RT-PCR (Fig. 3, lanes 3 and 4). In contrast, the Northern probe used to detect transcripts of the oppositely oriented gene did not detect any such transcripts in the muscle, in contrast to other tissues (Fig. 5C).
Interestingly, the ϳ2-kb transcript was highly expressed in the testis, together with additional probe-hybridizing transcripts, of Ͼ4.5, ϳ2.3, and Ͻ1.35 kb (Fig. 5C). This and related results identified a set of novel transcripts in mouse cells that are encoded by a previously uncharacterized gene in the vicinity of the 192-bp bidirectional promoter element but in the opposite orientation to ATE1, and partially overlapping with it. Investigations of this gene and its products are underway. 6 Substrate Preferences of ATE1 1A7A Versus ATE1 1B7A -In part to attempt in vitro identification of new physiological substrates of the N-end rule pathway, and also because we wished to determine whether different R-transferase isoforms had preferences for specific substrates, we employed the [ 3 H]Argbased arginylation assay, using extracts from 12.5-day-old (E12.5) mouse embryos (26,36). To preclude the in vivo arginylation and to make certain that only a specific (added) isoform of R-transferase was responsible for arginylation in this in vitro assay, the extracts were prepared from previously described (36) ATE1 Ϫ/Ϫ embryos, which lacked arginylation. Specific epitope-tagged R-transferase isoforms (ATE1 1A7A , ATE1 1B7A , and ATE1 1B7B ; see Fig. 2B) were expressed in insect cells using recombinant baculoviruses and were purified to near homogeneity. The mouse embryo extracts used, termed S105, were supernatants after centrifugation of total extracts at 105,000 ϫ g, a step that removes the bulk of ribosomes. Besides an S105 extract (either ϩ/ϩ or ATE1 Ϫ/Ϫ ), the reaction mixture contained the following: [ 3 H]Arg; a purified mouse R-transferase (ATE1 1A7A , ATE1 1B7A or ATE1 1B7B ); puromycin to inhibit possible (residual) translation in S105; ATP regeneration system; a mixture of E. coli tRNAs and aminoacyl-tRNA synthetases to produce Arg-tRNA; the Arg-Ala dipeptide to inhibit possible ubiquitylation of arginylated proteins in the extract; MG132, a proteasome inhibitor, to reduce possible degradation of arginylated proteins; and a set of standard protease inhibitors. The reactions were carried out at 37°C for 1 h, followed by removal of unincorporated [ 3 H]Arg and the processing of samples for electrophoresis and fluorography.
This approach stemmed from our earlier detection, using specific antibodies, of greatly increased in vivo levels of RGS4, RGS5, and RGS16 in ATE1 Ϫ/Ϫ embryos (26). That in vivobased finding, in conjunction with additional evidence, indicated that these regulatory proteins (see Introduction) were physiological substrates of the arginylation branch of the N-end rule pathway (26). This result also implied that comparing the levels of in vitro-arginylated proteins between otherwise identical extracts from wild-type (ϩ/ϩ) and ATE1 Ϫ/Ϫ embryos would help distinguish, under such conditions, between (putative) physiological substrates of R-transferase and artifactually arginylated proteins, because the former, but not the latter, would be expected to be present at higher levels in ATE1 Ϫ/Ϫ extracts, having been spared the in vivo arginylation and degradation. A problem that makes this ATE1 Ϫ/Ϫ -based approach particularly helpful stems from the fact that compartments such as the ER and Golgi contain a number of proteins that bear destabilizing (including arginylatable) N-terminal residues. These proteins had been cleaved by signal peptidase during their translocation into the ER, and the cleavage specificity of this peptidase often yields a destabilizing residue at the N terminus of a translocated, compartmentalized protein (23). Even if detergents are absent during preparation of extracts, a compartmentalized protein may partially "leak" (from membrane vesicles to be removed by centrifugation) into the cytosolic fraction and become arginylated in an in vitro assay. An ATE1 Ϫ/Ϫ extract and a ϩ/ϩ extract would be expected not to differ significantly by the levels of "leaked" proteins, in contrast to the levels of proteins that are arginylated and degraded by the N-end rule pathway in ϩ/ϩ cells. (Note that some ER or Golgi proteins can be present, in vivo, in the cytosol and/or the nucleus as well. Thus, not all ER or Golgi proteins, if they are found to be arginylated in vitro, are necessarily artifacts; see below.) In the absence of added R-transferases, no incorporation of [ 3 H]Arg could be detected with ATE1 Ϫ/Ϫ embryo extracts, confirming the absence of both R-transferases and (residual) translation in this setting (Fig. 6A, lane 1). The same test with ϩ/ϩ embryo extracts showed low levels of [ 3 H]Arg incorporation, mediated by endogenous R-transferases (Fig. 6A, lane 2). When a purified R-transferase, such as ATE1 1B7A , was added to the above assay, the resulting patterns of [ 3 H]Arg incorporation were different between ϩ/ϩ and ATE1 Ϫ/Ϫ extracts, as could be shown by two-dimensional electrophoresis (Fig. 6, B-D). This result was obtained with ATE1 1A7A and ATE1 1B7A R-transferases ( Fig. 6 and data not shown). Remarkably, this approach revealed that purified ATE1 1A7A or ATE1 1B7A , when added at identical levels to identical samples of the E12.5 ATE1 Ϫ/Ϫ embryo extract, produced overlapping but significantly different patterns of arginylation, suggesting that these two R-transferases, which differ exclusively by their (sequelogous) N-terminal exons 1A and 1B (Fig. 2, B and C), have distinct preferences for specific protein substrates. Both of these R-transferases had comparable activities in the yeast-based in vivo degradation assay with an arginylation reporter such as Asp-␤gal (Fig. 4B). However, the yeast-based assay, in which R-transferases were overexpressed, would have missed significant but not all-or-none differences in the levels of activity of specific R-transferase isoforms. Some of the differences between the ATE1 1B7A -specific and ATE1 1A7Aspecific arginylation patterns are indicated by circles in Fig. 6D.
Thus, specific isoforms of R-transferase may have distinct functions. For example, although all of the isoforms are capable of arginylating N-terminal Asp, Glu, and (oxidized) Cys, the relative rates of arginylation, e.g. of Asp versus Glu, may differ among the isoforms. Yet another (nonalternative) possibility is that specific sequence and conformational features downstream of (arginylatable) N-terminal residues of a substrate may lead to physiologically relevant differences in the rate of substrate's arginylation by different isoforms of R-transferase. Our previous work (34) and more recent mutagenesis studies 7 indicated that N-terminal regions of R-transferases are essential for their activity and are likely to be a part of a substrate-binding domain. Thus, the distinct but sequelogous exons 1A and 1B, encoding N-terminal regions of ATE1 1A7A and ATE1 1B7A (Fig.  2, B and C), respectively, may confer physiologically relevant substrate preferences on these isoforms of R-transferase. In addition, different spatial locations of R-transferase isoforms in a cell (little is known about this at present (34)) may further modulate their intrinsic preferences for specific substrates. There is extensive precedent for the notion that different splicing-derived isoforms of an enzyme can have distinct locations in a cell and thereby different functions (78 -83).
Putative N-end Rule Substrates GRP78 and PDI-Some of the spots of proteins that were arginylated by ATE1 1A7A but not by ATE1 1B7A in ATE1 Ϫ/Ϫ extracts (Fig. 6, C and D) were excised and sequenced, using digestion with trypsin and MS/MS mass spectrometry (see "Experimental Procedures"). Two of these arginylated proteins were found to be GRP78 (glucose-regulated protein 78) and PDI, in agreement with their apparent molecular masses and isoelectric points in two-dimensional electrophoretic patterns ( Fig. 6D and data not shown). Other proteins that were arginylated by ATE1 1A7A or ATE1 1B7A in ATE1 Ϫ/Ϫ but not in ATE1 ϩ/ϩ embryo extracts ( Fig. 6 and data not shown) remain to be identified. Mammalian GRP78, also referred to as BiP, is an abundant 70-kDa protein, a member of the family of HSP70 stress proteins, and a major chaperone in the lumen of ER, where the bulk of GRP78 resides (84). Mammalian PDI (PDIA1) is a 55-kDa protein-disulfide isomerase. It catalyzes disulfide bond formation, reduction, or isomerization, is a member of the family of structurally related PDI enzymes, functions also as a molecular chaperone, and resides largely in the lumen of ER (85). As nascent proteins, mammalian GRP78 and PDI bear N-terminal signal sequences that are cleaved by signal peptidase upon their translocation into the ER, yielding Glu and Asp, respectively, at the N termini of translocated GRP78 and PDI. Both of these N-terminal residues are Nd s residues in the N-end rule (Fig. 1A) and would therefore be expected to be arginylated by R-transferases, in agreement with our in vitro results (Fig. 6D).
The [ 3 H]Arg-labeled spots of GRP78 and PDI were present in the extracts from ATE1 Ϫ/Ϫ embryos (containing an added specific R-transferase) but not in extracts from ATE1 ϩ/ϩ embryos (containing a set of endogenous R-transferases) (Fig. 6, B-D). This strongly suggested that the in vitro-arginylated GRP78 and PDI did not result from artifactual "leakage" of ER proteins into the cytosolic fraction during preparation of these extracts, an inference consistent with the earlier lines of evidence that both PDI and GRP78 can also occur, in vivo, outside the ER (57-60).
Nevertheless, our evidence that the two proteins are physiological substrates of R-transferase (implying their cytosolic and/or nuclear localization), although strongly suggestive, is still indirect, because it remains formally possible that subsets of GRP78 and PDI might have leaked from the ER and were arginylated by endogenous R-transferases in the ATE1 ϩ/ϩ extract (but not in the ATE1 Ϫ/Ϫ extract) before the addition of [ 3 H]Arg, despite the near Ϫ0°C temperature during the processing of extracts and despite a likely decrease of endogenous Arg-tRNA under in vitro conditions, before the addition of aminoacyl-tRNA synthetases and tRNA.
Yet another possibility is that the absence of arginylation in ATE1 Ϫ/Ϫ embryos may increase the expression levels of ERlocalized GRP78 and PDI, in comparison to ATE1 ϩ/ϩ embryos. Note, however, that even if GRP78 and PDI were, in fact, induced in the absence of R-transferase, that effect alone would not preclude the possibility that these proteins are bona fide in vivo substrates of the N-end rule pathway. The "overexpression" interpretation was made unlikely by the data (not shown) that indicated similar total levels of, respectively, GRP78 and PDI in E12.5 ATE1 Ϫ/Ϫ embryos versus ATE1 ϩ/ϩ embryos, as determined by immunoblotting with antibodies to GRP78 and PDI. Although informative otherwise, such data cannot distinguish between the levels of GRP78 and PDI in the ER (their major location) and the presumably much lower (and independently changing) levels of GRP78 and PDI at their non-ER locations in a cell (see below). This is an illustration of the difficulty in addressing, definitively, the biological meaning of the above in vitro results (Fig. 6, B-D). We should also mention that the above identification of two spots of arginylated proteins in Fig. 6 as GRP78 and PDI, although direct and robust as an MS/MS-based evidence, is, strictly speaking, an interpretation of data, because of the formal possibility that other (arginylatable) protein species comigrated with GRP78 or PDI in twodimensional electrophoretic fractionations. Although unlikely for several reasons, this possibility is formally unexcluded. Having identified GRP78 and PDI as putative arginylation substrates, our further work in this area aims to bypass in vitro approaches, by developing an in vivo-based method to solve the problem of which GRP78 and PDI are the first examples: how to identify, in vivo, a subset of molecules of a given protein that may undergo arginylation (or other modifications) in the cytosol, for example, if the bulk of this protein resides in another compartment such as the ER.
Remarkably, there is an independent and functionally relevant evidence for the in vivo location of a fraction of both GRP78 and PDI on the cytosolic side of the ER. Stockton et al. (57) identified a protein complex, termed the translocon-resident protein-disulfide isomerase complex (TR-PDI), that included both PDI and GRP78 and was present in the cytosol, where it interacted with the cytosolic face of SEC61, the major component of ER translocation channel. Specific aspects of the TR-PDI complex remain to be clarified, including its function in protein translocation, the paths that GRP78 and PDI take to form this complex, and the metabolic fates of TR-PDI's components, including GRP78 and PDI (57). For example, in order to form the TR-PDI complex, must these proteins be retrotranslocated from the ER into the cytosol, with their signal sequences cleaved off earlier? Or does TR-PDI consist of proteins that did not enter the ER, as yet, and therefore retained their signal sequences, bearing the initial, stabilizing N-terminal residues? One possibility is that the former route to TR-PDI is the actually taken one, at least with GRP78 and PDI. If so, these proteins bearing, respectively, Glu and Asp at the N termini (see above) may be arginylated and degraded by the N-end rule pathway upon, for instance, dissociation of the TR-PDI, if such is the dynamics of TR-PDI in vivo. This model is consistent with in vitro properties of the TR-PDI complex (57). One way to address these issues would be to determine whether the N termini of GRP78 and PDI in the TR-PDI complex are either already arginylated or can be arginylated upon dissociation of TR-PDI. One can also ask, using previously constructed mouse ATE1 Ϫ/Ϫ cell lines (26,36), whether TR-PDI-derived GRP78 and PDI in these cells (a subset of total GRP78 and PDI) become short lived upon re-expression of R-transferase.
Another study, by Grune et al. (60), showed that transient treatment of a rat liver cell line with H 2 O 2 resulted in a degradation-mediated decrease of PDI, one of two proteins identified above as putative in vivo substrates of R-transferase. The oxidation-induced in vivo degradation of PDI was mediated by the proteasome (60), and therefore must have taken place in the cytosol and/or the nucleus. The N-terminal sequence of PDI from these cells, determined by Edman sequencing, lacked the (initially present) signal sequence, thus identifying the bulk of sequenced PDI as either a luminal ER protein (the main location of PDI) or a protein retrotranslocated from the ER. In the N-terminally sequenced PDI, its (arginylatable) Asp-26 residue was the second residue, and the identity of the first residue was unclear (60). One possibility (not the only one) is that the first residue was not the encoded Ser-25 but the post-translationally conjugated Arg residue. If so, this (oxidatively damaged?) species of PDI could be a specific subset of PDI in a cell, retrotranslocated from the ER and arginylated by ATE1-encoded R-transferases, thus becoming a target for the rest of the N-end rule pathway and thereby possibly accounting for the findings of Grune et al. (60). This currently speculative model can be tested using approaches mentioned above in the context of GRP78.
Concluding Remarks-N-terminal arginylation, a protein modification universal among eukaryotes and apparently confined to them (47), is mediated by ATE1-encoded R-transferases and is a part of the N-end rule pathway, a Ub-dependent proteolytic system (Fig. 1A). The functions of the arginylation branch of the N-end rule pathway include the regulation of cardiovascular development and homeostasis (36,39), the regulation of signaling by NO and oxygen (26,37) (this regulation is likely to subserve, in particular, the pathway's cardiovascular functions), the regulation of apoptosis (51,52), the regulation of leaf senescence in plants (53), and the regulation of chromosome mechanics through the arginylation-dependent degradation of the separase-produced fragment of a subunit of mammalian cohesin 3 (see Introduction for additional information and references). Although it is possible that mouse ATE1 gives rise to more than six splicing-derived mRNA isoforms described above, including two new isoforms (Fig. 2, B-D), both our present work and preceding efforts to identify ATE1 isoforms (34,36,61) suggest that the current set may be the com-plete one. We do not know, as yet, whether R-transferase isoforms have distinct functions. The current circumstantial evidence suggests this possibility, given strong variations in expression levels of specific ATE1 mRNA isoforms in different mouse tissues (34) (Fig. 3), and also the observed differential activity of the ATE1 1A7A versus ATE1 1B7A isoforms toward putative physiological substrates (Fig. 6).
Another set of new findings reported and discussed above includes the discovery of bidirectionality of the ATE1 promoter (P ATE1 ), specifically the ability of an ϳ200-bp DNA segment (region 1) immediately upstream of the start codon of exon 1B (which encodes one of two alternative N termini of R-transferase; Fig. 2, B-D) to function as a promoter in both orientations. This short CpG-rich segment of P ATE1 is bidirectional not only in model (transfection-based) settings but also in mouse tissues in vivo, where it mediates, at least in part, the transcription of both ATE1 and an oppositely oriented, previously uncharacterized gene, which is expressed in several mouse tissues (Fig. 2D and Fig. 5D). Our findings about this (second) gene will be described elsewhere. 6 Although bidirectional promoters (whose features include the presence of CpG islands) appear to mediate the expression of more than 10% of genes in a mammal, relatively few examples of transcriptional bidirectionality have been characterized thus far in mammals, and even fewer bidirectional promoters were analyzed in detail (69 -77). Our finding that P ATE1 is bidirectional adds an unexpected dimension to the understanding of this promoter, whose spatiotemporal and cell type-specific regulation is known to be complex 5 (36). Although the ϳ200-bp bidirectional promoter element is upstream of the start codon of the ATE1 exon 1B, it is downstream of the alternative ATE1 exon 1A (Fig. 1F). This arrangement is consistent with interesting possibilities, such as regulation of ATE1 and the (oppositely oriented) adjacent gene through transcriptional interference and/or, for example, the formation, through bidirectional transcription in this region, of double-stranded RNA molecules, a hallmark of RNA interference (86). One of many remaining questions about P ATE1 is the location of its element(s) that mediates the formation of mRNAs encoding ATE1 1A7A and the other two exon 1A-containing ATE1 isoforms (Fig. 2, B-D).
Yet another set of new results is the identification of two putative physiological substrates of mouse R-transferases that are distinct from the previously known (and definitively identified) RGS4, RGS5, and RGS16 (26, 37) (see Introduction). The above RGS proteins are conditionally short lived proteins that are targeted for the arginylation and degradation via the NOdependent activation of their Cys-based N-degrons. Given the tripartite structure of N-degrons, it is possible that some physiological substrates of R-transferase would be found to lack a complete N-degron, i.e. that their arginylation would not be followed by their degradation.
The two substrates of R-transferase described in the present work are GRP78 (BiP), an ER-localized chaperone, and PDI, another ER-localized enzyme. Both of these proteins are still putative N-end rule substrates, given their in vitro (but genetically based) identification ( Fig. 6 and discussion above), and the localization of the bulk of these proteins in the ER, a compartment that lacks the N-end rule pathway (23). However, several lines of evidence, described above, suggest that both GRP78 and PDI can be present in other cellular compartments as well, including the cytosol and the nucleus (57)(58)(59)(60), where these proteins may be N-end rule substrates. Whether this is actually so in vivo and the functions of degradation of GRP78 and PDI by the N-end rule pathway remain to be determined.
The findings of the present work about mouse ATE1, its bidirectional promoter, alternative splicing, the resulting R-transferase isoforms, and their putative physiological substrates raised a number of new questions. Studies to address them are under way.