![]()
|
|
||||||||
-End Processing of Pre-mRNA*
(Received for publication, May 16, 1997, and in revised form, August 13, 1997)
and
From the Department of Cell Biology, Biozentrum of the University
of Basel, Klingelbergstrasse 70, CH-4056 Basel, Switzerland and the
European Molecular Biology Laboratory (EMBL),
Meyerhofstrasse 1, D-69117 Heidelberg, Federal Republic of Germany
Critical events in 3
-end processing of
pre-mRNA are the recognition of the AAUAAA polyadenylation signal
by cleavage and polyadenylation specificity factor (CPSF) and the
binding of cleavage stimulation factor (CstF) via its 64-kDa subunit to
the downstream element. The stability of this CPSF·CstF·RNA complex
is thought to determine the efficiency of 3
-end processing. Since
downstream elements reveal high sequence variability, in
vitro selection experiments with highly purified CstF were
performed to investigate the sequence requirements for CstF-RNA
interaction. CstF was purified from calf thymus and from HeLa cells.
Surprisingly, calf thymus CstF contained an additional, novel form of
the 64-kDa subunit with a molecular mass of 70 kDa. RNA ligands
selected by HeLa and calf thymus CstF contained three highly conserved
sequence elements as follows: element 1 (AUGCGUUCCUCGUCC) and two
closely related elements, element 2a
(YGUGUYN0-4UUYAYUGYGU) and element 2b
(UUGYUN0-4AUUUACU(U/G)N0-2YCU).
All selected sequences tested functioned as downstream elements in
3
-end processing in vitro. A computer survey of the EMBL
data library revealed significant homologies to all selected elements
in naturally occurring 3
-untranslated regions. The majority of element
2a homologies was found downstream of coding sequences. Therefore, we
postulate that this element represents a novel consensus sequence for
downstream elements in 3
-end processing of pre-mRNA.
The primary transcripts (pre-mRNAs) of a eukaryotic cell
undergo several different maturation steps to become fully functional messenger RNAs (mRNAs). One of these maturation events is the 3
-end processing reaction, during which the pre-mRNA receives a
new 3
-end that is in almost all cases a poly(A) tail. First, the
pre-mRNA is endonucleolytically cleaved at the polyadenylation site
(poly(A) site). In a second tightly coupled event, the polyadenylation reaction, approximately 250 adenosine residues are added to the upstream cleavage product, whereas the downstream fragment is rapidly
degraded (for reviews, see Refs. 1-6).
In vivo and in vitro studies have revealed a
requirement for distinct sequence elements for 3
-end processing of
pre-mRNAs, a highly conserved AAUAAA sequence located upstream of
the poly(A) site and so-called downstream elements. Moreover, several
sequences located upstream of the AAUAAA signal have been shown to
enhance the cleavage reaction (for reviews, see Refs. 3, 4, 7).
Downstream elements show a high sequence variability and many different motifs have been proposed to be involved in downstream element function as follows: YGUGUUYY1 (8), GUGUUG (9), CAYUG (10), AGGUUUUUU (11), UCCUGU (12), or simply UGU clusters (13). Recently, it was shown that a UUUUU element located 6-25 nucleotides downstream of the AAUAAA sequence is sufficient to confer cleavage activity to a substrate whose natural downstream region has been completely deleted (14). Due to the abundance of uracil and guanine residues in these motifs, downstream elements are usually referred to as U- or G/U-rich elements.
One of the best analyzed downstream regions is that of the SV40 late
pre-mRNA. Although it was not possible to identify single nucleotides that are essential for poly(A) site function (15), a
deletion of about 20 nucleotides downstream of the poly(A) site inhibits 3
-end processing (16-18). The SV40 late downstream element consists of two parts. Each part alone allowed efficient processing when the other part was substituted with unrelated polylinker sequence.
Only the substitution of both parts together inhibited cleavage of the
SV40 late pre-mRNA (17). Other bipartite downstream elements were
identified in the
-globin genes of rabbit (19) and mouse (20).
It has also been demonstrated that the distance between the AAUAAA signal and the downstream element is critical. Moving the downstream element further downstream can not only abolish cleavage but can also shift the cleavage site (14, 19, 21-25).
To date, six factors involved in the cleavage and polyadenylation
reactions have been identified as follows: cleavage and polyadenylation
specificity factor (CPSF), cleavage stimulation factor (CstF), cleavage
factors Im and IIm (CF Im, CF
IIm), poly(A) polymerase (PAP), and poly(A) binding protein
II (for reviews, see Refs. 1, 3, 4). Most of these factors have been
purified, and several have been cloned. Interestingly, many homologs to the mammalian 3
-end processing components have been found in yeast
suggesting a conserved mechanism in lower and higher eukaryotes (for
reviews, see Refs. 2 and 26).
The recognition of the AAUAAA sequence by CPSF is thought to be the
first step in the formation of a 3
-end processing complex. This
initial complex is stabilized by the subsequent binding of CstF to the
downstream element (27-29). The stability of this commitment or
ternary complex correlates with the efficiency of poly(A) site usage
(29). CF Im, CF IIm, and PAP then join to form
a fully active 3
-end processing complex.
CstF consists of three polypeptides with molecular masses of 50, 64, and 77 kDa (28, 30), all of which have been cloned (31-33). The 64-kDa subunit interacts with the downstream element of pre-mRNAs (28, 30, 31, 34).
A correlation between CstF activity and the usage of different poly(A)
sites was observed during the adenoviral life cycle (35) and mouse
B-cell development (36). It has been demonstrated that overexpression
of the 64-kDa subunit of CstF in stably transformed B-cells induces the
switch from the membrane-bound to the secreted form of immunoglobulins
via alternative polyadenylation (37). These results demonstrate that
CstF plays a critical role in 3
-end processing.
Since the sequence requirements for CstF-RNA interaction have only been
poorly characterized, we performed in vitro selection experiments (SELEX, Ref. 38) with CstF purified from calf thymus whole
cell extracts and HeLa cell nuclear extracts. Interestingly, CstF
purified from calf thymus contained an additional polypeptide with a
molecular mass of 70 kDa, which represents a novel form of the 64-kDa
subunit. CstF preferentially selected highly conserved sequence
elements rather than guanine- and/or uracil-rich sequences per
se. The selected sequences functioned as downstream elements in
3
-end processing in vitro, and homologies to them were
found in natural 3
-untranslated regions (3
-UTRs) of many genes,
suggesting a role of these sequences as downstream elements in
vivo.
Macroprep Q resin was purchased from Bio-Rad;
Blue Sepharose was prepared as described previously (39). All other
column resins and prepacked FPLC columns were from Pharmacia Biotech Inc., as well as RNAguard and m7GpppG. Phenylmethylsulfonyl
fluoride was purchased from Serva, leupeptin hemisulfate and Nonidet
P-40 from Fluka, pepstatin from Bachem, and ammonium sulfate from Life
Technologies Inc. All restriction enzymes, Moloney murine leukemia
virus reverse transcriptase, and polynucleotide kinase were from New
England Biolabs; creatine kinase, creatine phosphate, calf intestine
alkaline phosphatase, Klenow enzyme, and SP6 RNA polymerase were from
Boehringer Mannheim. T7 RNA polymerase was purchased from Stratagene,
and Taq DNA polymerase (AmpliTaq) was from Perkin-Elmer. DNA
sequencing was performed with Sequenase version 2.0 (United States
Biochemical Corp.). Cordycepin 5
-triphosphate (3
-dATP), dNTPs, and
NTPs were from Boehringer Mannheim; all radioactively labeled NTPs and
dNTPs were from Amersham Corp. Polyvinyl alcohol was purchased from Sigma, and dithiothreitol (DTT) was from GERBU Biotechnik GmbH.
Whole cell
extract from 2 kg of calf thymus was applied to two DEAE-Sepharose fast
flow columns as described previously (39). The flow-throughs were
pooled and precipitated with ammonium sulfate (50% saturation).
Consecutive backwashes with ammonium sulfate were performed (45, 25, and 20% saturation). The 20% ammonium sulfate pellet (10.7 g of
protein) was dialyzed and applied to Blue Sepharose columns. CstF
activity was further purified by heparin-Sepharose, Macroprep Q, Mono Q
FPLC, Mono S FPLC, Superose 6 FPLC, and poly(U)-Sepharose
chromatography. The final poly(U)-Sepharose column (1.5 ml) was
equilibrated with 35 ml of 0.3 M KCl in buffer G (50 mM Tris-HCl (pH 7.9), 0.5 mM EDTA, 10% v/v
glycerol, 0.02% v/v Nonidet P-40, 0.5 mM DTT, 0.5 mM phenylmethylsulfonyl fluoride, 0.4 µg/ml leupeptin,
0.7 µg/ml pepstatin) and developed with a 20-ml gradient from 0.3 to
2 M KCl. CstF activity eluted around 1 M KCl,
and the fractions were pooled and concentrated (Centricon-30, Amicon)
to a final protein concentration of 48 µg/ml. This fraction (Fig.
1A, lane 1) was used for selection experiments.
For the first purification of HeLa CstF, nuclear extracts (5) were prepared from 8.4 × 1010 HeLa cells (2.6 g of protein). The DEAE-Sepharose flow-through was precipitated with ammonium sulfate (80% saturation), and no further backwashes were performed. Macroprep Q, Mono S, and Superose 6 columns were omitted. Single fractions of the final poly(U)-Sepharose column that contained CstF activity were dialyzed against 20 mM KCl in buffer G. One fraction was used for the selection experiments (Fig. 1A, lane 2). The protein concentration of this fraction was 32 µg/ml. A second purification was from 5.8 × 1010 HeLa cells (approximately 2 g of protein). Nonidet P-40 was omitted; the DEAE-Sepharose flow-through was not precipitated with ammonium sulfate. The poly(U)-Sepharose column was loaded at a salt concentration of 0.25 M KCl and developed with a gradient (25 ml) from 0.25 to 2 M KCl. CstF containing fractions were dialyzed against 20 mM KCl in buffer G. The protein concentration of the fraction used for selection experiments (Fig. 1A, lane 3) was 64 µg/ml.
Purification of Other 3
-End Processing Factors
CPSF was purified from calf thymus (39), and recombinant bovine PAP was prepared as described previously (40). Crude fractions of CF Im and IIm were prepared as follows: HeLa nuclear extracts were diluted with buffer G to 75 mM KCl and applied to a DEAE-Sepharose fast flow column equilibrated with 75 mM KCl in buffer G. The column was developed with a gradient (10 column volumes) from 75 to 500 mM KCl. Fractions containing CF Im/IIm activity were pooled and loaded directly onto an 8-ml Mono Q FPLC column. The column was developed with a gradient (25 column volumes) from 100 to 500 mM KCl in buffer G. CF Im/IIm activity eluted between 250 and 300 mM KCl. These fractions were dialyzed against 100 mM KCl in buffer G and used for cleavage reactions.
Selection of RNA LigandsDNA oligonucleotides were used to
transcribe RNA substrates for the first round of the SELEX procedure
(38). Oligo 1 (5
TAGGCTAGGATCCATCTTGT(N20)ATCGTTCGTGAGCTCGTCCCTATAGTGAGTCGTATTACGCG 3
) contained a BamHI restriction site in the 5
part,
a T7 RNA polymerase promoter sequence (underlined) and a
SacI restriction site in the 3
part, and encoded an RNA of
60 nt (5
GGGACGAGCUCACGAACGAU(N20)ACAAGAUGGAUCCUAGCCUA 3
). 4 pmol of both oligo 1 and oligo 2 (5
CGCGTAATACGACTCACTATAGGG 3
; complementary to the T7 RNA polymerase promoter sequence) were annealed. Transcriptions were performed as recommended by the
manufacturer at 37 °C for 1 h, and transcripts were
gel-purified. Filter binding reactions (20 µl) contained 0.2 mM DTT, 0.01% v/v Nonidet P-40, 20 mM creatine
phosphate, 0.5 mM ATP, 1.5 mM
MgCl2, CstF as indicated and were incubated at 30 °C for
30 min. The reaction mixtures were filtered under vacuum through BA 83 nitrocellulose filters (Schleicher und Schüll) that had been
equilibrated with buffer W (50 mM KCl, 50 mM
Tris-HCl (pH 7.9), 1.5 mM MgCl2, 0.5 mM DTT) and saturated with 20 µg of Escherichia
coli total RNA. Filters were washed with 4 ml of ice-cold buffer
W. The selected RNAs were eluted from the nitrocellulose filters with
350 µl of urea elution buffer (41). The eluate was extracted with
phenol/chloroform and ethanol-precipitated, and the RNA pellet was
resuspended in 10 µl of H2O containing 50 pmol of oligo 3 (5
TAGGCTAGGATCCATCTTGT 3
). After annealing of oligo 3 to the RNA,
reverse transcription was performed in a volume of 30 µl in the
presence of 25 units of RNAguard and 15 units of Moloney murine
leukemia virus reverse transcriptase as recommended by the
manufacturer. After 45 min at 37 °C, 70 µl of H2O was
added, and the DNA was extracted with phenol/chloroform,
ethanol-precipitated, and resuspended in 30 µl of H2O.
One-third of this sample was amplified by PCR in a final volume of 50 µl in the presence of 10 mM Tricine (pH 8.4), 50 mM KCl, 0.01% w/v gelatin, 1.5 mM
MgCl2, 2.5 units of Taq DNA polymerase, 0.4 mM of each dNTP and 50 pmol of both oligo 3 and 4 (5
CGCGTAATACGACTCACTATAGGGACGAGCTCACGAACGAT 3
). PCRs were performed
in a HYBAID reactor (Biotechnology LTD; 30 cycles: 15 s 94 °C,
30 s 51 °C, 30 s 72 °C; 1 cycle 5 min 72 °C). The
DNA was gel-purified, phenol/chloroform-extracted,
ethanol-precipitated, and resuspended in 10 µl of H2O.
2-5 µl of this DNA were used either to transcribe RNA for the next
round of selection or was subcloned. The following selection conditions
were applied as follows: the first round used 4 pmol of CstF (0.2 µM), and subsequent rounds used 0.4 pmol (0.02 µM). The RNA concentration varied as follows: round 1, 0.2 µM; rounds 2 and 3, 0.2 µM; round 4, 0.4 µM; rounds 5 and 6, 1 µM; round 7, 2 µM; and round 8, 20 µM. PCR products from
the final round of selection were digested with BamHI and
SacI and subcloned into Bluescript KS vectors (Stratagene) for sequencing.
SV40 wild type RNA was transcribed from the
plasmid pSV-L (15). The plasmid pSV-141/-1 (17) is an SV40 late
derivative of which the complete downstream region was replaced by a
XbaI linker and pBR322 sequences. The XbaI site
located in the polylinker region of the plasmid pSV-141/-1 was deleted
by digestion with BamHI and SalI, the recessed
3
-termini were filled with Klenow enzyme. The resulting plasmid
(pSV
-1) contains a single XbaI site immediately
downstream of the natural polyadenylation site. DNA oligonucleotides
containing XbaI site overhangs (CTAG) at their 5
-ends and
encoding sequences of interest in either sense or antisense orientation
were annealed and subcloned into the XbaI site of pSV
-1.
The correct insertions were confirmed by sequencing. All pSV
-1
derivatives were linearized with EcoRI, pSV-L with
DraI, and uniformly labeled RNA substrates were obtained by
SP6 RNA polymerase transcription and gel purification (42).
Cleavage reactions were performed as
described previously (42) with the following modifications: 20 fmol
radioactively labeled RNA substrate were used and reactions were
incubated for 1 h at 30 °C. In some experiments, 0.5 mM 3
-dATP and 1.5 mM MgCl2 were replaced by 1 mM ATP and 1 mM EDTA. HeLa CstF
was titrated in the range of 0 and 10 ng, and crude CF
Im/IIm fractions (10 µl) were used. The
cleavage reactions were quantitated with a PhosphorImager 425 (Molecular Dynamics) and IPlab Gel (version 1.5, Signal Analytics Corp.). The calculation of cleavage activity took into account the loss
of radioactivity present in the downstream cleavage fragment. To
compare the cleavage activities of SV
-1 derivatives with SV40 late
pre-mRNA, the percentage of cleavage obtained with 4.5 ng of CstF
and SV40 was set to 100%. These relative cleavage activities were
determined in at least three independent cleavage reactions for each
construct (except for SV-B13: two independent experiments), and their
averages are presented in Fig. 3.
-end processing in vitro. Either selected
sequences or artificial constructs carrying shortened element 1 or
element 2a sequences were tested for their ability to restore 3
-end
processing of the cleavage-deficient pre-mRNA SV
-1, which lacks
its natural downstream element (for details, see "Experimental
Procedures"). A, RNA sequences of SV40 late, SV
-1, and
derivatives. The bipartite structure of the SV40 downstream element is
indicated by brackets and numbers above the
sequence (17), the CstF-binding site is screened in
gray (34). The sequences inserted into SV
-1 are underlined, and the minimal element 1 (11-mer element,
UGCGUUCCUCG) and the GU and AY motifs of element 2a are
screened in gray. SV-A42, SV-B13, and
SV-C1 carry the selected sequences A-42, B-13, and C-1.
SV-
B14 contains the selected sequence B-14 in antisense orientation and was used to embed the 11-mer element and shortened element 2a sequences. SV-E1P4 and SV-E1P12
contain the 11-mer element at positions 4 and 12 of the inserted
sequences, respectively. SV-E1d encodes a duplicated 11-mer element.
SV-E2a carries extended GU and AU motifs, SV-G/A contains a
minimal GU motif (GUGU) and a minimal AY motif (AUU). Other RNA
substrates contain these minimal motifs in different combinations. The
experimentally determined positions of the cleavage sites of some
substrate RNAs are indicated by arrowheads. The average
cleavage activities are given as relative cleavage activities in
comparison to SV40 late. B, cleavage reactions. HeLa CstF
was titrated between 0 and 10 ng, whereas the complementing factors
(CPSF, PAP, and CF Im/IIm) were present in
non-limiting amounts (for details, see "Experimental Procedures").
Black arrowheads indicate the precursor RNA, and open
arrowheads indicate the 5
cleavage products. Quantitation of
cleavage activities of substrates carry either element 1 (C)
or element 2a sequences (D). The cleavage reactions were
quantitated as described under "Experimental Procedures." The value
obtained for SV40 with 4.5 ng of CstF was set at 100%, and this
reference point is indicated by a vertical line. The average
of at least three independent experiments is presented (except for
SV-B13), and the relative cleavage activities of all constructs are
summarized in A.
Immunoblot Analysis
Proteins were separated on an SDS-7.5% polyacrylamide gel, blotted on nitrocellulose, and detected with chemiluminescence staining (ECL kit, Amersham Corp.) as recommended by the manufacturer. The CstF-64 polyclonal antibodies were diluted 1:10000.
UV Cross-linking100 fmol of CstF and 400 fmol of
radioactively labeled RNA (Fig. 2, A-1) were
incubated in 12.5 µl including 2 mM DTT, 20 mM creatine phosphate, 0.5 mM ATP, 1.5 mM MgCl2, 0.01% Nonidet-P40, 0.1 µg/ml
bovine serum albumin for 20 min at room temperature. After UV
irradiation (500 kJ, Stratalinker UV1800, Stratagene), 200 ng of RNase
were added and reactions were incubated for 30 min at 37 °C before
separation on a SDS-7.5% polyacrylamide gel. The gel was fixed (20%
2-propanol, 10% acetic acid), dried, and exposed to Kodak X-Omat AR
films.
Distinct sequence elements are selected by
CstF. All sequences selected by calf thymus CstF (named
A-x) or HeLa CstF (named B-x and C-x)
are shown, and their frequencies in the corresponding pools are
indicated on the right (frequency of insert). Some sequence alignments included the last uracil (white letter) of the 5
constant region of the template RNA. Those nucleotides that led to the consensus sequences are screened in gray. The alignments are
presented for each consensus element, and the frequency of every
nucleotide is given in percentage, residues conserved to 100% are
screened in gray. The abundance of each insert in
the corresponding pool was taken into account. The derived consensus
sequences are shown for each element at the bottom. The frequency of
each element in the different pools is shown on the right
(frequency of element). Those inserts that shared less than 75%
identity to the consensus sequences are indicated by
asterisks.
Computer Surveys
The different consensus elements selected by CstF were translated into specific search programs written in VAX Pascal (43) and used to screen the EMBL data library (44; release 48). First, the programs searched for the presence of the polyadenylation signal AATAAA (no mismatch allowed). If this signal had been found, for each consensus element a search up to 50 nt downstream would be conducted subsequently. Pool 1 was obtained with element 1 demanding ATGCGTT with at least 5 matches and CCTCGTCC directly following. Pool 2a, screened with element 2a, demanded the sequence YGTGTY with at least 4 matches, directly or up to 5 nt later followed by TTYAYTG with at least 4 matches and directly or up to 2 nt later the sequence YGT. Two pool 2b screens with element 2b were performed. The first for pool 2b sequences (pool 2b/T4) required the TTGYT with at least 3 matches, directly or up to 5 nt later followed by ATTTTACT(T/G) with at least 3 matches and directly or up to 2 nt later the sequence YCT. The second screen (pool 2b/T3) differed from pool 2b/T4 in that ATTTACT(T/G) with at least 3 matches was required instead of ATTTTACT(T/G). As a positive control, the pool M was screened for the presence of the consensus sequence for downstream elements YGTGTTYY proposed by McLaughlan et al. (8). No minimum number of matches was demanded for this screen.
CstF was purified from calf thymus whole cell extracts and twice independently from HeLa cell nuclear extracts (for details, see "Experimental Procedures"). Fractions of the final poly(U)-Sepharose columns used for the selection of RNA ligands (see below) are shown in Fig. 1A. HeLa CstF consists of three polypeptides with molecular masses of 50, 64, and 77 kDa (Refs. 27, 28, 30, 45; Fig. 1A, lanes 2 and 3). CstF purified from calf thymus also contained the 50- and 77-kDa polypeptides but differed from HeLa CstF in that the 64-kDa subunit was much less abundant and contained an additional polypeptide with a molecular mass of approximately 70 kDa (Fig. 1A, lane 1). To investigate whether this 70-kDa subunit represents an alternative form of the 64-kDa polypeptide, calf thymus and HeLa CstF were separated on high resolution SDS-polyacrylamide gels and either stained with silver (Fig. 1B) or used for Western blot analysis and immunodetection with polyclonal antibodies raised against the human 64-kDa subunit (Fig. 1C). On this gel, the 64-kDa subunit of HeLa CstF emerges as a doublet of 64 and 62 kDa and a significantly less abundant 66-kDa polypeptide (Ref. 30; Fig. 1B, lane 2). The 64-kDa subunit of calf thymus CstF is a doublet of 62 and 60 kDa (Fig. 1B, lane 1). Furthermore, a less abundant polypeptide with a molecular mass of approximately 52 kDa is visible (Fig. 1B, lane 1). Both 64-kDa doublets, the HeLa 66-kDa polypeptide and the calf thymus 52- and 70-kDa polypeptides, were recognized by the polyclonal antibodies (Fig. 1C). However, the monoclonal antibody 3A7 (30) directed against the human 64-kDa subunit did not recognize the 70-kDa subunit (data not shown). In addition, the 52- and 70-kDa subunits can be as efficiently UV cross-linked to RNA as the other 64-kDa polypeptides (Fig. 1D). These results reveal differences in the polypeptide composition of calf thymus and HeLa CstF in respect to the 64-kDa subunit that interacts with the RNA. The precise nature of these differences must await the cloning of cDNA coding for the new subunit.
Selection of RNA Ligands by CstFTo investigate the sequence requirements for CstF-RNA interaction, purified calf thymus and HeLa CstF (Fig. 1A) were used for in vitro RNA selection experiments (38). The RNA substrates (60 nt) in which the central 20 nucleotides were randomized were subjected to filter binding reactions with pure CstF. Eight rounds of selection with increasing stringency were performed. To ensure that any putative ligand of CstF was selected during the first round of selection, a CstF:RNA ratio of 1:1 was chosen (0.2 µM CstF). During the subsequent steps of selection, the CstF concentration was kept constant (0.02 µM), whereas the RNA concentration was increased progressively from 0.2 to 20 µM corresponding to CstF:RNA ratios from 1:10 to 1:1000, respectively (for details, see "Experimental Procedures"). The RNA pool A was selected by calf thymus CstF (Fig. 1A, lane 1) and pools B and C by two independent preparations of CstF from HeLa cells (Fig. 1A, lanes 2 and 3). During this procedure, the affinities of the selected RNA pools increased 65-fold (apparent KD values approximately 5.6 nM; Table I) in comparison to the starting pool 0 (apparent KD 360 nM).
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To analyze the RNA pools, PCR products of the starting pool 0 and the last selection rounds (pools A, B, and C) were subcloned, and 50 clones of each pool were sequenced. Some of these clones contained multiple insertions, so that altogether between 52 and 74 inserts of each pool were analyzed (Table I). No identical sequences were found in pool 0. In contrast, the number of different sequences was drastically reduced in those pools that have been subjected to selection by CstF: only 10 different sequences were found in pool A, 22 in pool B, and 14 in pool C (Table I). The sequences of these inserts are presented in Fig. 2.
The average abundance of the nucleotides in all pools is shown in Table I. A frequency of 5 out of 20 nt corresponds to a random distribution and was expected for all nucleotides in pool 0. This was only true for uracil (5.1) and guanine (4.3) but not for adenine (3.9) and cytosine (6.7). These deviations might be due to the unequal use of nucleotides during DNA-oligonucleotide synthesis. Upon CstF selection, the adenine content was reduced (1.7-2.0), the guanine content was nearly unchanged (4.4-5.3), and the amount of cytosine was only diminished in the calf thymus pool A (3.2). Instead, the selected RNAs were enriched in uracil, which was more significant in pool A (9.8) than in those pools that were selected by HeLa CstF (6.9, 7.1).
As shown in Table I, the number of different sequences in the RNA pools decreased upon CstF selection. Pool A, which was selected by calf thymus CstF, contained mainly two different RNAs (Fig. 2, A-1, 45.2% and A-2, 33.9%). The most abundant inserts of pool B, selected by HeLa CstF, were B-2 (15.1%), B-1, B-10, and B-15 (13.2% each). Pool C consisted of three prominent sequences (C-1, 36.5%, C-3A, 28.6%, and C-5, 14.3%).
Sequence compilation of all selected RNAs led to the deduction of three different sequence elements, element 1 and two closely related elements 2a and 2b (Fig. 2). Element 1 had the consensus AUGCGUUCCUCGUCC, and each nucleotide of this element was conserved with a frequency of at least 75%, five residues were conserved to 100% (Fig. 2). Nucleotides 2-9 of this element were homologous to the consensus sequence for downstream elements YGUGUUYY (8). With exception of one single RNA (A-22/3: 60% identity), all RNAs of this group were at least 87% identical to the derived consensus sequence.
A second highly conserved element was
YGUGUYN0-4UUYAYUGYGU (Fig. 2,
element 2a). Each nucleotide was conserved with a frequency
of at least 85%, and three residues were conserved to 100% (Fig. 2).
Element 2a had a bipartite structure as follows: a GU motif (YGUGUY) in
the 5
part and a pyrimidine-rich part (AY motif, UUYAYUG) containing a
highly conserved adenine residue (94% conservation) followed by a
second shortened GU motif (YGU). The distances between the first two
parts were variable, 65% of all inserts aligned had insertions of 1-4
nt between the 5
GU motif and the AY motif. Only a few (8%) of the
selected sequences also had insertions (1 to 2 nt) between the AY motif
and the 3
YGU. With exception of four selected RNAs (A-7 and B-7, 63%
identity; B-5 and C-37, 75% identity), all RNAs were at least 88%
identical to the consensus of element 2a.
The third element, element 2b
UUGYUN0-4AUUUACUGN0-2YCU,
strongly resembled element 2a but was less conserved (Fig. 2). Only the
inserts of pool C shared at least 76% homology to this element; A-11
and B-2 were 59% identical, and B-1 and B-36 were 69% homologous.
Element 2b differed from element 2a in that both the 5
-GU motif and
the AY motif were slightly altered. Furthermore, 88% had insertions (1 to 2 nt) between the AY motif and the 3
-YCU. Interestingly, the two
nucleotides inserted between the AY motif and the last three
nucleotides of both elements 2a and 2b were in most cases adenine and
cytosine residues, which therefore formed a second, shortened AY
motif.
Our results demonstrate that distinct sequence elements rather than random guanine and uracil residues are required for efficient CstF-RNA interaction. Furthermore, different sequence elements were selected to different extents by calf thymus and HeLa CstF; element 1 was present more frequently in those pools that were selected with HeLa CstF (pool B, 34.0%; pool C, 41.3%) than in the pool selected with calf thymus CstF (Fig. 2, pool A, 9.7%), whereas the closely related elements 2a and 2b were the most frequent elements in the calf thymus pool (sum of elements: pool A, 90.3%; pool B, 66.0%; pool C, 58.7%). These findings might indicate that differences in the polypeptide composition of calf thymus and HeLa CstF may be reflected by slightly altered RNA binding properties.
In Vitro 3
-End Processing Reactions with Selected Sequences as
Downstream Elements
Selected element 1 and 2a sequences and
minimal versions of both were tested for their ability to restore
cleavage activity of the non-functional SV40 late pre-mRNA
derivative SV
-1, whose natural downstream region had been
substituted by an XbaI linker and unrelated sequence. DNA
oligonucleotides encoding the sequences of interest were inserted into
this XbaI site (for details, see "Experimental
Procedures"). The sequences of the first 49 nt following the AAUAAA
hexamer of all RNA substrates used are shown in Fig. 3A.
Three substrates (SV-A42, SV-B13, and SV-C1) contained selected element
1 sequences as downstream elements, whereas others contained only the
most highly conserved part of element 1 (11-mer element, UGCGUUCCUCG)
at different positions of the inserted oligonucleotides (SV-E1P4 and
SV-E1P12) or as a duplication (SV-E1d). The sequence of SV-
B14,
which carried the selected sequence B-14 in antisense orientation and
which was processed as inefficiently as SV
-1 in in vitro
cleavage reactions, was used to embed the 11-mer elements of SV-E1P4
and SV-E1P12.
The processing efficiencies of these constructs were determined in reconstituted in vitro cleavage reactions with highly purified PAP, CPSF, and partially purified CF Im/IIm fractions in non-limiting amounts, whereas highly purified HeLa CstF was titrated between 0 and 10 ng (for details, see "Experimental Procedures"). Typical cleavage reactions are shown in Fig. 3B. At least three independent titration experiments were performed for each construct (except SV-B13, two independent experiments), of which the averages relative to SV40 are presented in Fig. 3C. The amount of cleavage activity obtained with 4.5 ng of CstF and SV40 was defined as 100%, and the relative cleavage activities at this reference point for all RNAs are summarized in Fig. 3A.
All element 1 sequences restored cleavage activity of SV
-1 (Fig. 3,
B and C, left panel). SV-A42 and SV-B13 were
nearly as efficiently processed as SV40 (75-80%), whereas SV-C1 was
less active (55%). Furthermore, the 11-mer element alone was able to function as a downstream element (Fig. 3C, right panel).
SV-E1P4 was processed with moderate efficiency (45%), whereas
SV-E1P12, of which the 11-mer element was shifted further downstream,
was processed with high efficiency (80%). Duplication of this 11-mer element (SV-E1d) restored cleavage activity to wild type levels (Fig.
3A, 95%), and furthermore, SV-E1d was the only
construct that was cleaved exclusively at the natural poly(A) site. A
shift of the cleavage site was observed for all other element 1 constructs as follows: SV-B13, SV-C1, and SV-E1P4 were cleaved 4 nt,
SV-A42 7 nt further downstream of the natural poly(A) site. SV-E1P12 was processed at three different sites; the major cleavage site was
the natural poly(A) site but additional sites 7 and 12 nt further
downstream were efficiently used as well.
To investigate whether element 2a can function as a downstream element
in 3
-end processing in vitro, RNA substrates containing selected sequences or variants of this element were analyzed as described for element 1. The sequences of all element 2a containing RNA
substrates as well as their relative cleavage activities are summarized
in Fig. 3A. The selected sequences A-1 and B-14 restored cleavage activity of SV
-1 with moderate efficiencies (40-55%; Fig.
3, A and D, left panel) but to the same extent as
the element 1 construct SV-E1P4 (55%). This moderate activity is
probably due to a non-optimal position of element 2a relative to the
AAUAAA signal.
To investigate the requirement of both the GU motif and the AY motif of
element 2a for downstream element function, several variants were
constructed that contained these motifs embedded into the
non-functional SV-
B14 sequence. SV-E2a contained extended, SV-G/A
significantly shortened element 2a sequences (Fig. 3A). SV-E2a was cleaved more efficiently (60%) than the minimal element 2a
substrate SV-G/A (45%). But the minimal substrate SV-G/A was still
processed to the same extent as SV-A1 (40%), which contains the
complete selected sequence A-1 (Fig. 3, A and D, left
panel). Substrates containing a deletion of either of these motifs
(SV-G/0 and SV-0/A) were only poorly processed (15-20%; Fig. 3,
A and D, right panel). This indicates that both
motifs are required for optimal CstF-RNA interaction.
To investigate whether the GU and AY motif are functionally equivalent, RNAs were created that contained these motifs in inverted order or either of them duplicated. A switch of the positions of the minimal GU and AY motifs (SV-A/G, 30%; Fig. 3, A and D, right panel) as well as a duplication of the minimal AY motifs (SV-A/A, 20%) led to a further reduction of the cleavage activity in comparison to SV-G/A. In contrast, SV-G/G containing two minimal GU motifs was processed as efficiently as SV-G/A and reached nearly wild type activity at higher CstF concentrations (Fig. 3, A and D, right panel). This indicates that the GU motif can substitute for the AY motif, but not vice versa, and suggests that the GU motif is the more important part of element 2a.
Taken together, these results demonstrate that the sequences selected
by CstF as well as shortened versions are able to function as
downstream elements in 3
-end processing of pre-mRNA, although to
different extents. Also, the fact that the selected sequences functioned as downstream elements in the absence of their constant flanking regions indicates that the flanking sequences played no
essential role in CstF binding during the SELEX procedure.
To investigate whether the selected sequence elements
can be found in 3
-untranslated regions of genes and thus might also function as downstream elements in vivo, a computer survey
was performed. The EMBL data library was screened with appropriate programs that searched for the presence of a perfect match to the
polyadenylation signal AATAAA. This pool (pool V) comprised 45,889 vertebrate and viral sequences, which were subsequently screened for
the presence of either of the selected elements up to 50 nt downstream
of the AATAAA signal. Pool 1 was obtained with element 1 (ATGCGTTCCTCGTCC; for details, see "Experimental Procedures") and
pool 2a with element 2a allowing a second gap in the 3
part of
the motif
(TGTGTYN0-5TTYAYTGN0-2YGT). Two screens were performed with element 2b. Pool
2b/T3 was obtained with the short version
(TTGYTN0-5ATTTACT(T/G)N0-2YCT), and pool 2b/T4 was obtained with the longer variant of
element 2b
(TTGYTN0-5ATTTTACT(T/G)N0-2YCT).
The gaps between the first and the second parts of elements 2a and 2b
were increased to five nucleotides according to alignments of these
motifs with already identified downstream regions (Refs. 8 and 34; data not shown). As a control, the pool M was generated by screening for the
presence of the consensus sequence for downstream elements YGTGTTYY
(8).
The number of sequences obtained for the different pools are presented in Table II in respect to the degree of homology to the requested element. The majority of pool V sequences did not fulfill the minimum requirement for the pool 1 screen that demanded at least 5 matches to the first part of element 1 (ATGCGTT). Furthermore, only sequences with not more than 13 matches (87% identity) were found in this pool. In contrast, the pools 2a, 2b, and M contained fewer sequences that did not fulfill the minimum requirements for the distinct screens, and sequences with 100% identity were found. These discrepancies are most likely due to the fact that element 1 is strictly conserved, whereas both elements 2a and 2b are degenerate due to the pyrimidines and the gaps in their consensus sequences.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To analyze whether the sequences identified by the computer survey
might be putative downstream elements in vivo, the locations of these sequences in the genes were investigated. Sequences of pool 1 with at least 10 matches, of pool 2a with at least 14 matches, of both
pools 2b (T3 and T4) with at least 15 matches,
and 175 sequences of pool M were analyzed. After the elimination of all non-vertebrate virus sequences, duplications or sequences that did not
contain any coding sequence, 322 sequences of pool 1, 179 of pool 2a,
61 of pool 2b, and 68 of pool M were analyzed in detail. As shown in
Table III, 32% of pool 1 sequences
contained the homology to element 1 inside the coding sequence, 45%
downstream of it, and 23% were found in introns. In contrast, in only
13% of pool 2a, 8% of pool 2b, and 12% of pool M sequences were the distinct motifs present within the coding sequences. The homologies in
pool 2b and pool M sequences were located either in introns (43 and
31%, respectively) or in 3
-UTRs (49 and 57%, respectively), and pool
2a sequences were found in 70% of the cases downstream of the coding
sequence. The majority of all elements downstream of the coding
sequence were in the context of the first AATAAA signal. Several
examples for pool 1, 2a and 2b sequences that contained the distinct
motifs downstream of the coding sequence are presented in Fig.
4. Taken together, homologies to all
selected elements can be found in the 3
-UTRs of several genes and
probably function as downstream elements in 3
-end processing in
vivo.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CstF was purified to homogeneity from calf thymus whole cell extract and HeLa cell nuclear extracts. Interestingly, their polypeptide composition differed with respect to the 64-kDa subunit (Fig. 1) which was shown to interact with the downstream elements of pre-mRNAs (31, 34). The 64-kDa polypeptide split into a 62/64-kDa doublet in HeLa CstF and a 60/62-kDa doublet in calf thymus CstF, which might be due to partial degradation. In addition, two polypeptides of 70 and 52 kDa were present in calf thymus CstF, which were recognized by anti-64-kDa polyclonal antibodies and could be UV cross-linked to RNA (Fig. 1, B-D). Whereas the 52-kDa protein might be a degradation product, the 70-kDa protein is an alternative form of the 64-kDa polypeptide that may result from alternative splicing. It is unlikely that its significantly different migration behavior on SDS-polyacrylamide gels is caused by post-translational modifications. Furthermore, the monoclonal antibody 3A7 (30) directed against the human 64-kDa subunit did not recognize the 70-kDa polypeptide, indicating the absence of the required epitope (data not shown). The precise nature of the new subunit will have to be determined by cDNA cloning.
It can be speculated whether alternative 64-kDa subunits might confer different RNA-binding properties to CstF. In fact, the RNA pool selected by calf thymus CstF differed from the selected HeLa pools: the uracil content was significantly higher (9.8) than in the HeLa pools (Table I, 6.9 and 7.1) and element 1 sequences were rare (pool A, 9.6%; pool B, 34%; and pool C, 41%; Fig. 2). However, no significant differences between these CstFs were observed in RNA-binding reactions or in reconstituted in vitro cleavage reactions with several RNA substrates (data not shown). Since both HeLa and calf thymus CstF contain a mixed population of different 64-kDa polypeptides, only separate analysis of these subunits can address this question in detail.
CstF Selects Highly Conserved RNA LigandsIn contrast to the high sequence variability of downstream elements in vivo, only three specific sequence elements were selected by CstF in vitro, element 1 (AUGCGUUCCUCGUCC) and two related elements 2a (UGUGUYN0-4UUYAYUGYGU) and 2b (UUGYUN0-4AUUUACU(U/G)N0-2YCU). All selected RNAs contain either of these motifs, and only a few sequences share homologies with lower than 75% to the consensus elements 1 and 2a. Several nucleotides of these elements are highly conserved (at least 76% identity), which is surprising regarding the difficulty to determine a consensus sequence by alignments of naturally occurring downstream elements. Only element 2b is slightly less conserved. Since all purifications and selection experiments were performed independently, the enrichment of identical sequence elements in the different pools imply sequence-specific RNA-binding preferences for CstF-RNA interactions.
The selected elements share homologies to motifs that have been
proposed for downstream element function. Element 1 is a significantly extended version with one mismatch of the previously proposed consensus
sequence for downstream elements YGUGUUYY (8). Both elements 2a and 2b
contain novel combinations of a GU-rich motif similar to GUGUUG (9) and
an AY motif similar to CAYUG (10). All elements have a GU-rich motif in
their 5
-half in common. These GU motifs vary slightly as follows:
UGCGUU for element 1, YGUGUY for element 2a, and UUGYU for element 2b.
Interestingly, point mutagenesis of the downstream element of SV40
early pre-mRNA (23) that changed the natural sequence UUGUGGU to
either UUGUGUU or UUGUUGU and thus created
sequences identical to the selected GU motifs of element 2a and 2b,
respectively (underlined), increased the 3
-end processing efficiency
about 3-fold in comparison to wild type. These results and those
obtained by our in vitro selection experiments indicate that
GU motifs play a critical role in CstF-RNA interaction and that
specific rather than random GU-rich sequences seem to be preferred.
In contrast to the conserved 5
parts of all selected elements, the 3
parts are more variable. In element 1, a pyrimidine-rich sequence is
present, whereas elements 2a and 2b contain AY motifs. These findings
suggest a bipartite structure and a variable sequence requirement in
the 3
part of the RNA ligands. One can speculate that the 64-kDa
polypeptide of CstF binds to the RNA with two different domains since
it does not only contain a ribonucleoprotein-like RNA binding domain
(RBD) but also 17 RGG-like motifs preceding and overlapping with the
MEAR(A/G) repeats, which have been suggested to form an
-helical
structure and to be involved in protein-protein interactions (31).
RGG-like motifs usually occur in proteins that also contain RBDs and
are often modified post-translationally to modulate RNA-binding
activity (for review, see Ref. 46). Modifications of these RGG-like
motifs may result in different sequence preferences for the 3
part of
the RNA ligand. The existence of a second RNA-binding region in the
64-kDa subunit of CstF is also consistent with the results of a recent
SELEX study with the isolated RBD of the 64-kDa subunit of human CstF
(47). In contrast to the sequences selected with the complete CstF
factor described here, the RBD alone predominantly selected short
G/U-containing sequence elements. This difference is likely due to the
fact that amino acids outside of the RBD of the 64-kDa polypeptide
contribute to the binding specificity of CstF.
The adenosine residues of the AY motifs present in elements 2a and 2b
are highly conserved (at least 84%) and thus might be critical for
CstF-RNA binding. Further evidence for an involvement of adenosine
residues in CstF-RNA interaction comes from modification interference
assays with the selected RNAs A-1 and A-2 (data not shown) as well as
from two point mutagenesis experiments on downstream elements. It was
demonstrated that a stretch of five uracil residues is sufficient to
restore cleavage activity of a pre-mRNA that is otherwise not
processed. Inserting adenosine residues at four of these five positions
significantly decreased cleavage activity. Only the sequence UUAUU,
which resembles the central AY motifs of the selected elements 2a
(UYAYU) and 2b (UUACU), was processed as efficiently as UUUUU (14).
Point mutagenesis of the downstream element of adenovirus E2A revealed
a 1.3-fold stimulation in 3
-end processing, when the sequence UUGUUU
was changed to UUAUUU (23). Since this effect was not as dramatic as
changes in the GU-rich motif of SV40 early pre-mRNA, GU-rich motifs
obviously play a more critical role in 3
-end processing than AY
motifs. This is also indicated by the finding that all selected
elements contained a GU-rich element but not all had an AY-rich
motif.
-End
Processing
To investigate whether the selected sequences were
able to function as downstream elements in 3
-end processing, they were subcloned into an SV40 late pre-mRNA derivative whose
polyadenylation signals had been inactivated by deleting the natural
downstream region. All selected sequences tested, including shortened
versions, were able to restore cleavage activity, although to different extents. Those RNAs (SV-A42 and SV-E1P12), whose 11-mer element was
located as far downstream from the AAUAAA signal as the CstF-binding site of SV40 (34), were processed more efficiently than substrates that
contained the 11-mer element further upstream (SV-B13, SV-C1, and
SV-E1P4). This is in agreement with previous reports that showed the
dependence of both efficiency and accuracy of the cleavage reaction on
the position of the downstream element (14, 19, 21, 22, 24, 25).
Nevertheless, the only RNA that was cleaved with wild type efficiency
and accuracy was SV-E1d, which contained a duplication of the 11-mer
element and thus created a bipartite downstream element. Bipartite
downstream elements have not only been reported for SV40 late RNA (17)
but also for other RNAs (19, 20) and support the idea that CstF
contains two RNA-binding domains.
Selected element 2a sequences were also able to restore cleavage activity. The efficiencies of these RNA substrates were comparable to those element 1 constructs of which the 11-mer element was located at the beginning of the inserted sequence. This is most probably due to a non-optimal position of the downstream element relative to the AAUAAA signal.
Further analysis revealed that even short GU and AY motifs were able to
restore cleavage activity. Again, the GU motif was the most important
part of element 2a, since it could substitute for the AY motif but not
vice versa. This is in good agreement with the conservation of the GU
motif in the 5
part of all selected elements and the already suggested
role for GU-rich sequences in downstream element function (see above).
Furthermore, our results demonstrate that CstF-RNA interactions during
3
-end processing tolerate significant mutations of the downstream
element. This is in contrast to the highly conserved sequences of the
elements that were selected by CstF in vitro in the absence
of any other 3
-end processing factor. It is likely that
protein-protein interactions between CstF and other components of the
3
-end processing machinery can compensate for weak CstF-RNA
interactions. Therefore, several sequences can function as downstream
elements although with different efficiencies. This might enable the
cell to carefully regulate 3
-end processing. It has been demonstrated
that overexpression of the 64-kDa subunit in stably transformed B-cells
induced alternative polyadenylation (37). Considering the different
polypeptide compositions of calf thymus and HeLa CstF, it is also
conceivable that this regulation might be influenced by the expression
of different 64-kDa subunits.
A computer survey of the EMBL data library was performed to
investigate whether the sequence elements selected by CstF in vitro were also present in genes and thus play a role in 3
-end processing in vivo. Homologies to either of the selected
elements 1, 2a, or 2b are present in 89% of all sequences with a
perfect match to the AATAAA hexamer (pool V). Taking into account that only 16% of all AATAAA signals are present in coding sequences (48),
about 70% of all homologies found should be located outside of protein
coding sequences. Indeed, element 2a was mainly found in the 3
-UTR of
genes. This strongly suggests a role for element 2a in 3
-end
processing in vivo, particularly if one takes into account
that 3
-UTRs are four to five times less abundant in this sequence
library than coding sequences. In contrast, element 1 was also
frequently found within protein coding sequences, a finding that does
not strongly argue for its involvement as a general downstream element
in vivo on first sight. But since these homologies included
the AATAAA hexamer, the presence of these sequences within the coding
region does not exclude the function of such a sequence in 3
-end
processing when appropriately located in 3
-UTRs.
Furthermore, 128 downstream element regions (8, 34) were screened for
the presence of the selected elements 2a and 2b (data not shown). About
51% of these sequences contained the selected elements with at least
70% identity downstream of their natural cleavage sites. Two of these
sequences were 94% identical to element 2a and were also detected with
the computer screen (Fig. 4, OCBGLO and
HEHS1ATI). Interestingly, a detailed study of the rabbit
-globin pre-mRNA (Fig. 4, OCBGLO) identified the
sequence that exhibits the high homology to element 2a as the natural
downstream element (19) .
Our results demonstrate that CstF purified from
two sources differed with respect to their 64-kDa subunit that is
responsible for CstF-RNA interactions. Calf thymus CstF contained an
additional, novel 70-kDa polypeptide that could be UV cross-linked to
RNA and that was recognized by polyclonal antibodies directed against the 64-kDa subunit. Considering the sequence variability of downstream elements, the selection of highly conserved sequence elements by CstF
was surprising. The selected motifs functioned as downstream elements
in in vitro 3
-end processing reactions. Homologies to all
selected elements were found in the 3
-UTRs of many genes. These
results strongly suggest that the sequences selected in vitro function as natural downstream elements in vivo.
We propose that the closely related elements 2a and 2b represent a
novel consensus sequence for downstream elements.
-UTRs, 3
-untranslated regions; FPLC, fast protein liquid
chromatography; RBD, RNA binding domain; PCR, polymerase chain
reaction; nt, nucleotide; DTT, dithiothreitol; Tricine,
N-[2-hydroxy-1,1-bis(hydroxymethyl)ethyl]glycine.
We thank Marvin Wickens for the plasmid pSV-141/-1; Georges Martin for recombinant bovine PAP; Elmar Wahle, Andreas Jenny, and Silvia Barabino for purified CPSF; Christine Milcarek and Kathleen Martincic for the polyclonal antibodies directed against the 64-kDa subunit of CstF; Clinton MacDonald for the monoclonal antibody directed against the 64-kDa subunit of CstF and sharing sequence information on downstream elements. We also thank Silvia Barabino, Lionel Minvielle-Sebastia, Mary O'Connell, Ursula Rüegsegger, and Elmar Wahle for comments on the manuscript. T. D. thanks Iain Mattaj and Gene Expression (EMBL) and Heiner Schirmer for stimulating discussions.