Inefficient spliceosome assembly and abnormal branch site selection in splicing of an HIV-1 transcript in vitro.

Continuous replication of human immunodeficiency virus type I (HIV-1) requires balanced expression of spliced and nonspliced mRNAs in the cytoplasm. This process is regulated post-transcriptionally by the viral-encoded Rev protein. An important prerequisite for Rev responsiveness is the presence of weak splice sites in the viral mRNA. We have investigated the splicing of the second intron of the HIV-1 Tat/Rev transcript in vitro and show that the 3′-splice site region is responsible for the inefficient splicing of the HIV-1 transcript. In contrast, the HIV-1 5′-splice site is highly functional in combination with a heterologous 3′-splice site. Incubation of the HIV-1 transcript in nuclear extract leads to a rapid accumulation of 50 S nonproductive pre-spliceosome complexes. These complexes contain mainly U1 and U2 small nuclear ribonucleoproteins and are formed independently of the presence of the downstream 3′-splice site. The HIV-1 transcripts, which do proceed through the first splicing step, utilize primarily a uridine as the branch acceptor nucleotide. Sequence comparison with other HIV-1 introns suggests that nucleotides other than adenosines are commonly used as branch points in these viruses.

Most mammalian genes are interrupted by introns which are removed during processing of the primary RNA transcripts in the nucleus. The cellular splicing machinery recognizes conserved sequences near the two splice sites and removes the intron prior to nuclear transport to the cytoplasm. This is a complex process involving the assembly of U1, U2, U5, and U4/U6 small nuclear ribonucleoproteins (snRNPs) 1 on the pre-mRNA (reviewed in Refs. 1 and 2). Initially, U1 snRNP interacts with a conserved sequence at the 5Ј-splice site, U2 snRNP then binds to the branch point region just upstream from the 3Ј-splice site, and the remaining U5 and U4/U6 snRNPs enter this complex as a tri-snRNP particle to form the complete spliceosome. In addition to the snRNPs, a number of other auxiliary protein factors play important roles in spliceosome assembly. In particular, ASF/SF2 and U2AF assist the binding of U1 and U2 snRNPs to conserved sequences at the 5Ј-splice site and the branch point region, respectively, and SC-35 is critical for the recognition of the 3Ј-splice site (reviewed in Ref. 1). The RNA splicing process is generally very efficient, leading to the export of only one major mRNA species to the cytoplasm. However, the expression of a substantial number of mRNAs is tissue-specific and developmentally regulated by alternative splicing. Differentiated splicing products may be obtained by a number of different mechanisms including the utilization of alternative 5Ј-and 3Ј-splice sites, exon and intron inclusion or skipping, and mutually exclusive exons, all of which have been found in biological systems (3). The complexity of the splicing reaction makes it a well suited target for regulation of gene expression, and the mechanisms involved in biological systems appear to be diverse (3).
Retroviruses have evolved a post-transcriptional regulatory system based on intron retention, in order to express multiple proteins from the same promoter. Crucial for the life cycle of all retroviruses is a balanced expression of an unspliced mRNA of about 9 kb and a singly spliced mRNA of about 4 kb, encoding the Gag/Pol and Env proteins, respectively. Most of our knowledge about elements controlling this differential splicing comes from studies of avian sarcoma viruses. In this group of viruses the ratio between genomic and singly spliced mRNAs appears to be constitutive and mainly regulated by a suboptimal 3Јsplice site (4 -7) and a negative regulator of splicing (NRS element), which acts in cis to decrease the splicing efficiency of the viral transcript (8,9).
In complex retroviruses, which include lentiviruses (e.g. HIV-1), spumaviruses, and human T-cell leukemia virus, the post-transcriptional regulation of splicing appears to be more complex. In addition to the two classes of mRNA found in all retroviruses, the complex viruses express a major class of approximately 2 kb long mRNAs, which encode a number of small regulatory proteins. Best studied are the HIV-1 regulatory proteins Tat and Rev, both of which are essential for virus propagation. Tat is a transcriptional activator, whereas Rev appears to function only at a post-transcriptional level, upregulating the appearance of unspliced and singly spliced mRNAs in the cytoplasm (reviewed in Ref. 10). There has been some controversy about what level of gene expression is subject for Rev regulation. Rev may function directly at the level of mRNA splicing (11)(12)(13)(14)(15), mRNA stability and transport (16 -20), and/or translation (21,22). Although some of these functions may be closely coupled, it suggests that Rev is a multifunctional protein.
The specificity of Rev relies on direct binding to the Rev response element (RRE) and the presence of cis-acting repressive sequences within the transcript. In contrast to the RRE, which is a well defined RNA element located at the start of the env gene, the identity of cis-acting repressive sequences is less well understood. These elements may constitute nuclear retention or instability elements and have been mapped to both the Gag, Pol, and Env regions (23)(24)(25)(26)(27).
Based on the observation that the introduction of weak splice sites into an RRE-containing ␤-globin gene renders the mRNA Rev responsive, it has been suggested that weak splice sites may function as nuclear retention elements (11). This interpretation is supported by a recent study showing that the integrity of the splice sites in HIV-1 mRNA is necessary for Rev regulation of HIV-1 gene expression (14). This implies that Revmediated regulation of HIV-1 gene expression requires an intrinsically inefficient splicing process. A search for splicing regulatory elements have identified a cis-acting repressor element within the first Tat-coding exon that suppresses splicing of the upstream intron in vitro and in vivo (28). This element is position-dependent, but works in the context of a heterologous intron. cis Elements controlling the splicing of the second intron of the Tat/Rev transcript have been investigated in vivo, and it is concluded that a nonoptimal 3Ј-splice site splicing signal is the main determinant for inefficient splicing (29). We have studied the splicing of the same HIV-1 intron and show that the 5Ј-splice site region directs a rapid accumulation of a 50 S complex, containing U1 and U2 snRNPs, and that the 3Ј-splice site is inefficiently recognized by the splicing apparatus. A possible role for the 50 S complex could be to retain the mRNA in the nucleus in the absence of Rev protein.
Preparation of RNA Transcripts-Pre-mRNAs were synthesized in vitro by T7 RNA polymerase transcription. The plasmid templates were generally linearized with HindIII, except from pPIP⌬TAT, which was linearized with AluI, and TAT5Јss RNA, which was synthesized from the pTAT4 template that had been linearized with AvaII and treated with Klenow enzyme. In some experiments pTAT4 was linearized with Sau3A, HaeIII, and AvaI resulting in truncated 3Ј-exons containing 80, 33, and 19 nucleotides, respectively. Transcription of radioactively labeled and capped mRNAs, used for in vitro splicing analysis, was done as described previously (12), and the RNAs were purified on 4% polyacrylamide denaturing gels. Biotinylated RNAs, used for streptavidin affinity purification, were synthesized in 100-l reaction mixtures, containing 5 g of linearized plasmid, 40 mM Tris/HCl (pH 7.4), 6 mM FIG. 1. Constructs. A, pTAT3 and pTAT4 derive from pgTatCMV which contains part of the HIV-1 HXB-3 genome. The TaqI fragment of pgTatCMV, that includes the Tat/Rev intron flanked by 234 and 84 bp of the 5Ј-and 3Ј-Tat-coding exons, respectively, was shortened by deleting parts of the intron. The resulting fragments were inserted downstream of a T7 promoter to yield pTAT3 and pTAT4. The RRE remains intact in pTAT3 but is absent in pTAT4. B, structure of the precursor RNA transcribed from pPIP7.A, pTAT4 and the chimerical constructs pPIPTAT, pPIP⌬TAT, pTATPIP, and p⌬TATPIP (the RNA transcripts are denoted by the plasmid names without the p prefix). PIP.7A contains a modified version of the first intron in the major late transcript of adenovirus (30). The PIPTAT transcript contains the 5Ј-half of PIP7.A RNA and the 3Ј-half of TAT4 RNA. TATPIP transcript contains the 5Ј-half of TAT4 RNA and the 3Ј-half of PIP7.A RNA. PIP⌬TAT and ⌬TATPIP RNAs are similar to PIPTAT and TATPIP RNAs but with shorter HIV-1 splice site regions. TAT5'ss contains a truncated Tat/Rev gene including only the 5Ј-exon and 80 bp of the downstream intron of the TAT4 construct. Positions of restriction sites used for cloning are indicated, and segments originating from PIP7.A and TAT4 constructs are denoted with black and white bars, respectively. Exons and introns are denoted with thick and thin boxes, respectively, and major splice sites are indicated by open circles (5Ј-splice sites) and closed circles (3Ј-splice sites). The in vitro splicing activity of each construct is indicated (ϩ, 50 -70% splicing turnover; Ϫ, less than 5% turnover). The constructs are drawn to the scales indicated (note that the scale of Panel B is 5 times enlarged compared to Panel A). Numbering is done according to Ratner et al. (31). For details in vector construction, see "Experimental Procedures. MgCl 2 , 4 mM spermidine, 10 mM dithiothreitol, 40 units of RNasin, 0.7 mM ATP, 0.7 mM UTP, 0.7 mM GTP, 0.7 mM CTP, 0.07 mM biotin-11-UTP (10% of total UTP), 1 Ci of ␣-35 S-UTP (Amersham Corp., 3000 Ci/mmol), and 200 units of T7 RNA polymerase. The RNAs were separated from unincorporated nucleotides on a Sephadex G-50 spin column and purified on 4% polyacrylamide denaturing gels. The final concentration of the RNA was calculated from the specific activity of incorporated 35

S label.
In Vitro Splicing and Complex Gel Analysis-In vitro splicing was performed essentially as described previously (12). For denaturing gel analysis, the mRNA was incubated for 90 min at 30°C, whereas for native gel runs and sucrose gradient centrifugations, samples were incubated for only 20 min at 30°C. To protect the RNA from 3Ј-end exonucleases in the nuclear extract, 10 g of tRNA were included in each splicing reaction. Splicing products were analyzed on denaturing gels, containing 6% polyacrylamide, 8 M urea, and 50 -100 mM Tris borate (pH 8.3), and splicing complexes were analyzed by loading 5 l of the splicing reactions onto native gels containing 2.5% 80:1 acrylamide/ bisacrylamide and 50 mM Tris/glycine (pH 8.8).
Purification of Splicing Complexes and Northern Blotting-Streptavidin affinity purification of biotinylated RNA and probing for U1, U2, U4, U5, and U6 snRNPs were performed as described by Kjems and Sharp (32). U11 and U12 snRNA antisense RNAs were prepared as described by Wassarman and Steitz (33).
Debranching, Primer Extension, and Sequencing of RNA Products-Debranching of lariats was done as described in Ruskin and Green (34) by reincubating the phenol-extracted splicing reaction in a 25-l mixture, containing 20 l of debranching buffer (50 mM HEPES (pH 7.8), 100 mM KCl, 0.1 M EDTA), and 5 l of the S100 fraction of HeLa cell extract (35) for 30 min at 30°C, followed by phenol extraction and precipitation. Primer extension analysis was performed essentially as described in Kjems et al. (36), using 1 pmol of 5Ј-end-labeled primer (5Ј-GTCGGGTCCCCTCGGG-3Ј), complementary to position 15-30 downstream from the 3Ј-splice site in TAT4 and gel-purified RNA templates. To obtain sequence information 1 mM ddTTP or ddATP, or 0.5 mM of ddGTP or ddCTP, was included in each of the respective sequencing reactions. The TATPIP exon-exon product was purified and identified by the following procedure: 1 g of nonradioactive TATPIP was spliced in a 200-l splicing reaction and co-electrophoresed together with a hot sample on a denaturing gel. RNA co-migrating with the exon-exon band was purified, and the sequence across the ligation site was determined by primer extension as described above. Preparation of lariat TAT3 and TAT4 RNAs for primer extension was done similarly.

A Nonoptimal HIV-1 3Ј-Splice Site Renders Splicing an
Inefficient Process in Vitro-Controlling the cytoplasmic appearance of non-, single-, and double-spliced mRNAs is an important aspect of the HIV-1 life cycle. To determine elements important for this regulation, we have investigated the splicing reaction of the second intron of the double-spliced Tat and Rev transcript from HIV-1 strain HXB-3 using an in vitro splicing assay. This intron remains unspliced in transcripts encoding Gag/Pol, Env, Vif, Vpr, and Vpu, and its retention is subject to Rev control. The RNA transcripts TAT3 and TAT4, used in this study, contain the second and third exons encoding the Tat protein and different truncations of the second intron (Fig. 1). In the absence of Rev, these deletions did not influence the ratio between spliced and unspliced cytoplasmic RNA in vivo, suggesting that all sequences important for Rev independent splicing remain intact (11,32).
The splicing efficiency was investigated by incubating the transcript in a nuclear extract from HeLa cells. This is an appropriate system for studying HIV-1 regulation of splicing, based on the observation that HIV-1 precursor RNA exhibits a similar splicing pattern, when expressed in HeLa cells and T-cell lines (37). Under optimal splicing conditions, less than 2% of TAT4 was converted into a lariat product, migrating above the precursor RNA ( Fig. 2A, lanes 1 and 2). Based on a primer extension analysis (see below) this band contains intermediate lariat (IL; intron bound to the 3Ј-exon) and possibly also lariat intron alone. The presence of the RRE within the intron (TAT3) had no effect on splicing efficiency compared to that observed using TAT4 RNA (result not shown).
To study the efficiency of the 5Ј-and 3Ј-splice sites of HIV-1 in vitro a number of chimerical constructs between the PIP7.A, a construct optimized for splicing, and TAT4 RNAs were constructed (Fig. 1). When the 5Ј-half of PIP7.A including the Ϫ and ϩ denote lanes with samples that have been incubated at splicing conditions for 90 min in the absence and presence of ATP, respectively. The bands were identified as follows. The intermediate lariat RNA (IL) and lariat RNA (L) products were identified by their abnormal behavior on gels containing different salt and acrylamide concentrations. In addition, gel analysis of debranced splicing reaction was performed on TATPIP derived lariats (see Panel B). The linear splicing products including the unprocessed precursor RNA (P), ligated exon-exon RNA (EE), and 5Ј-exon RNA (5E) were identified on the basis of apparent size as compared to molecular size markers, and the TAT-PIP derived exon-exon product was sequenced by primer extension (data not shown). The samples were loaded on a 6% polyacrylamide gel containing 75 mM of Tris borate (pH 8.3). B, lariat identification of TATPIP splicing products on a 6% gel containing 100 mM Tris borate.
ϪD denotes untreated splicing products; ϩD denotes splicing products treated with debranching extract. M denotes the lane containing singlestranded DNA size marker, numbers on the left indicate the molecular sizes in base pairs. Identities of individual bands are indicated. The 5Ј-exon of TATPIP generally migrates as two bands of variable intensities. Sequence analysis of the TATPIP specific exon-exon product showed no sign of alternative 5Ј-splice site usage, suggesting that the lower 5Ј-exon band may result from partial RNA degradation of the upper 5Ј-exon band. 5Ј-splice site was substituted with the 5Ј-half of HIV-1 mRNA (TATPIP; Fig. 1), splicing became highly efficient, yielding more than 70% splicing products ( Fig. 2A, lanes 5 and 6). The identities of the branched splicing products were confirmed both by a change in mobility when altering the ionic strength of the gel and by debranching (Fig. 2B). Purification and direct sequencing of the exon-exon RNA product confirmed that the normal 5Ј-splice site of the HIV-1 RNA was correctly joined to the 3Ј-splice site of the PIP7.A RNA (result not shown).
When substituting the 3Ј-half of PIP7.A with the 3Ј-half of TAT4 (PIPTAT; Fig. 1) splicing was as inefficient as observed for TAT4 ( Fig. 2A, lanes 3 and 4). Similar results were obtained when shorter regions of the HIV-1 transcript, containing the 5Јor 3Ј-splice site regions, were inserted into PIP7.A to replace the corresponding splice site (⌬TATPIP and PIP⌬TAT, respectively; Figs. 1 and 2A, lanes 7-10), although a slight increase in splicing of PIP⌬TAT RNA was observed as compared to PIP-TAT and TAT4 (compare Fig. 2A, lanes 2, 4, and 8). These data imply that the region containing the 3Ј-splice site of the HIV-1 transcript is responsible for the inefficient splicing in vitro.
It has previously been shown that an element positioned downstream of the 3Ј-splice site of the first Tat intron inhibits the splicing of the upstream intron (28). To investigate the possibility that sequences within the 3Ј-exon flanking the second Tat/Rev intron function as inhibitory elements, the splicing efficiency of the TAT4 transcripts truncated at different positions within the 3Ј-exon was analyzed. No significant differences in splicing efficiency were detected using constructs containing 87, 80, 33, and 19 nucleotides of the 3Ј-exon, implying that no cis-acting inhibitory elements are present in the 3Јexon of the TAT4 transcript (results not shown).
Identification of the Branch Point Sequence in the HIV-1 Intron-Examination of the sequence upstream from the 3Јsplice site in the HIV-1 intron revealed no obvious branch site consensus. To identify the branched nucleotide, utilized for the inefficient lariat formation in vitro, 1 g of low specifically labeled TAT3 and TAT4 transcripts was incubated under splicing conditions for an extended period of time to increase the yield of splicing products. Approximately 2% of the radioactive label incorporated in TAT3 or TAT4 pre-mRNA transcripts appeared in bands corresponding to branched RNAs. The bands were excised from the gel, extracted, and annealed to a primer complementary to a region within the common 3Ј-exon of TAT3 and TAT4. When extended by reverse transcriptase, specific stops were observed as compared to a control reaction containing a template of unspliced TAT4 RNA. This suggests that the observed splicing product corresponds to an intermediate lariat. Surprisingly, the reverse transcription was almost completely terminated at a cytidine, located 47 nucleotides upstream from the 3Ј-splice site in TAT3 and TAT4 (Fig. 3A,  lanes 1 and 2), whereas no termination was observed at this position in the control reaction (Fig. 3A, lane 3). Since reverse transcription generally is arrested one nucleotide 3Ј to a branched nucleotide, this strongly suggests that the sequence UACUUUC is recognized as the branch site and that the underlined U is branched to the 5Ј-end of the intron (Fig. 3B). In addition, weaker bands were observed at nucleotides more proximal to the 3Ј-splice site which may represent alternative branch site nucleotides.
Splicing of the HIV-1 Intron Leads to Accumulation of a 50 S Pre-spliceosome Complex Containing Mainly U1 and U2 snRNPs-The splicing process requires a stepwise assembly of the snRNPs and other auxiliary splicing factors on the pre-mRNA in a highly ordered fashion. The splicing complexes formed on PIP7.A, TAT4 RNA, and chimerical constructs, when incubated in nuclear extract, were analyzed by native gel elec-trophoresis. As expected, PIP7.A and TATPIP, which both splice efficiently, formed pre-spliceosomes (complex A) and spliceosomes (complex B) very efficiently in the presence of ATP (Fig. 4A). In contrast, both TAT4 and PIPTAT formed only one complex, independent of ATP (Fig. 4A).
To investigate the content of these complexes in more detail, biotinylated mRNAs were incubated under splicing condition and fractionated on sucrose gradients. The PIP7.A RNA sedimented as a 40 S peak, corresponding to the A complex, and a 60 S ATP-dependent peak, corresponding to the B complex. (The 60 S peak was visible only as a shoulder on the 40 S peak of the sucrose gradient profile shown in Fig. 4B due to excess mRNA in this type of preparative gradient.) In contrast, the TAT4 mRNA sedimented in a peak at around 50 S (Fig. 4B). This peak formed independently of ATP (data not shown). Specific splicing complexes were purified from individual fractions of the sucrose gradient by streptavidin affinity chromatography and tested for the content of snRNA by Northern blotting, probing with a mixture of antisense snRNAs (Fig. 4C). As expected, the 40 S complex of the PIP7.A construct contained mostly U1 snRNP and some U2 snRNA, and the 60 S complex of the PIP7.A construct contained all five snRNPs, corresponding to the fully assembled spliceosome. In contrast, the TAT4 specific 50 S peak contained only U1 and U2 snRNPs and no U4/U6 and U5 snRNPs, suggesting that the low level of HIV-1 intron splicing is due to an inefficient assembly of the spliceosome.
To measure the kinetics of the HIV-1-specific 50 S complex formation, a splicing reaction containing TAT4 was incubated for different periods of time prior to complex purification and Northern blot analysis. The 50 S peak containing U1 and U2 snRNP was fully formed within 10 min, and extended incubation did not change the complex composition (Fig. 5).
It has been suggested that the U11 and U12 snRNPs may play an important role in regulating the splicing in Rous sarcoma virus (38). To test a putative role of these snRNPs in HIV-1 splicing, the Northern blots shown in Fig. 4C were reprobed with a mixture of antisense U11 and U12 snRNA probes. Although both snRNPs were easily detected in nuclear extract, neither of the snRNAs appeared to be specifically associated with PIP7.A or TAT4 pre-mRNA (results not shown).
The 3Ј-Splice Site Region Is Dispensable for U1 and U2 snRNP Binding-To characterize the regions within the HIV-1 sequence responsible for U1 and U2 snRNP binding, Northern blot analysis was performed on chimerical mRNA constructs (Fig. 6, A and B). The snRNP content of individual fractions of the TATPIP gradient looked very similar to that of PIP7.A except that the 40 S complex of the chimerical construct contained relatively more U2 snRNP (compare Fig. 4C, left panel,  and Fig. 6A). In contrast, incubation of PIPTAT pre-mRNA in nuclear extract produced a complex containing mainly U1 snRNP and relatively less U2 snRNP as compared to TAT4 (compare Fig. 4C, right panel, and Fig. 6B). This suggests that the 5Ј-half of the HIV-1 transcript does not only bind U1 snRNP, but also U2 snRNP. To analyze the complexes formed on a construct containing the HIV-1 5Ј-splice site region alone, Northern blot analysis was performed using TAT5Јss, which lacks the 3Ј-exon and most of the intron (Fig. 1). This construct does not produce any detectable splicing products when incubated in nuclear extract (TAT5Јss, Fig. 2A). An ATP-independent peak, sedimenting at around 40 S, was detected (results not shown), and Northern blot analysis of this peak revealed the presence of U1 and U2 snRNPs (Fig. 6C). Considering the differences in length of the transcripts, this 40 S peak may correspond to the 50 S peak observed for the TAT4 transcript, suggesting that the 3Ј-splice site region of TAT4 is dispensable for stable U1 and U2 snRNP interactions with the HIV-1 transcript.  (32). Unbiotinylated pre-mRNA (Ϫ) was used as a control for unspecific binding of snRNA to streptavidin beads and derived from fractions 10 -12 and 19 -21. Recovered complexes were denatured and the eluted snRNAs were electrophoresed on an 8% polyacrylamide-8 M urea gel. The RNA was electroblotted onto nitrocellulose membrane and probed with a mixture of antisense U1, U2, U4, U5, and U6 snRNA, that yielded approximately equal autoradiographic signals for the five snR-NAs in the 60 S peak of the PIP7.A splicing reaction. The identities of the bands are indicated on the left side.  Fig. 4. PIP7.A was included for comparison. Splicing complexes of the pre-mRNAs were affinitypurified from pools of fractions 11-12, 15-16, and 19 -20, corresponding to an approximate size of 40, 50, and 60 S, respectively. ϩ and Ϫ indicate the presence and absence of biotin in the RNA, respectively, and the identities of the bands are indicated on the left side.

DISCUSSION
Balanced expression of differentially spliced mRNAs is an evolutionary conserved feature among retroviruses. Studies of two distantly related retroviruses, Rous sarcoma virus and HIV-1, have revealed that the mechanisms controlling this balance are strikingly similar. Both viruses apparently use a combination of suboptimal 3Ј-splice site signals (4 -7, 29) (this report) and cis-acting negative regulators of splicing (8,9,28).
The elements controlling the splicing of the second intron in the HIV-1 Tat/Rev transcript have been investigated previously in vivo (29). In agreement with that report, we found that the 3Ј-splice site region of the HIV-1 Tat intron contains inefficient splicing signals for in vitro splicing, whereas the 5Јsplice site was highly efficiently spliced to a heterologous 3Јsplice site. Our analysis also suggests that the inefficiency of splicing is not controlled by cis-acting repressive sequences in the downstream exon, as observed for the 3Ј-splice site of the first Tat intron (28).
A mammalian 3Ј-splice site consensus is composed of a highly conserved AG immediately upstream of the splice site, a continuous stretch of 7 or more pyrimidines (preferably uridines), just upstream from the AG nucleotides, and a branch point sequence YNYURAY (Y ϭ pyrimidine, R ϭ purine, N ϭ any nucleotide), in which the highly conserved A is used as the branch point. During splicing the branch point sequence base pairs with GUAGUA in U2 snRNP, bulging out the branch point adenosine from the helix (reviewed in Ref. 1). Analysis of cellular genes in general shows that a long U-rich polypyrimidine tract can compensate for a poor branch site, and vice versa. However, inspection of the 3Ј-splice site of the second Tat/Rev intron reveals a very irregular polypyrimidine tract and no obvious branch point candidate. Improvement of the polypyrimidine tract, or introduction of a consensus branch site 29 -35 nucleotides upstream from the 3-splice site, increased the splicing efficiency in vivo, suggesting that both elements play an important role in maintaining a suboptimal 3Ј-splice site (29).
Surprisingly, we found that the uridine, underlined within the sequence UACUUUC, functioned as the major branch site in the formation of lariat splicing product in vitro. Although naturally occurring branch sites may differ at several positions from the consensus, the branch point adenosine is highly conserved. Mutational studies have shown that the occurrence of uridines at positions 5 or 6 in a branch point motif decreases the splicing efficiency severely (39,40). It is possible that the capacity of the remaining nucleotides to form 5 Watson-Crick base pairs with the U2 snRNA can partially compensate for the lack of an adenosine at the branch point position.
Branching to a uridine residue has been observed in artificial systems (39) and in the splicing of the alternatively processed calcitonin/CGRP-I pre-mRNA (41). In a study using ␤-globin mRNA mutated at the branch site it was demonstrated that all 4 nucleotides can serve as branch acceptors with the following efficiencies A Ͼ C Ͼ G Ͼ U in the first step of splicing (39). Alternatively, if an adenosine is present 1 nucleotide upstream of the normal branch site, this may function as a branch site (39). Since no adenosine is located adjacent to the branch point position in the Tat/Rev intron, branching may be forced to occur mainly at the uridine residue.
Inspection of sequences upstream from two other major 3Јsplice sites in HIV-1 revealed a similar branch point sequence containing uridine at the putative branch point nucleotide (Fig.  3B). Several observations suggest that this resemblance is functionally important. The putative branch point sequences are highly conserved among different HIV-1 and HIV-2 strains, the rare substitutions which are observed do not significantly destabilize the base pairing with U2 snRNA, the branch points are positioned in approximately the same distance from the 3Ј-splice sites (47-52 nucleotides), and no other obvious homology is found between the sequences upstream of the different 3Ј-splice sites (Fig. 3B). It remains to by experimentally confirmed whether the putative branch sites indicated in Fig. 3B are functional in vivo.
Taking advantage of our in vitro approach, we were able to analyze the splicing complexes formed on the HIV-1 mRNA and thereby determine at what step the assembly of the spliceosome was inhibited. Most of the mRNA rapidly accumulated into a 50 S complex containing U1 and U2 snRNPs independent of the presence of ATP. A similar complex was formed when using an RNA lacking the 3Ј-exon and most of the intron, suggesting that both U1 and U2 snRNPs mainly interact with the 5Ј-splice site or surrounding regions. Early stages in spliceosome formation have been investigated on other mRNAs. An ATP-independent commitment complex containing U1 snRNP, U2AF, and possibly also SF2/ASF has been characterized as the initial product of spliceosome assembly (30,42,43). Formation of this complex requires the presence of both 5Ј-and 3Ј-splice sites and the polypyrimidine tract. The lack of a consensus 3Ј-splice site in the HIV-1 intron suggests that the 50 S complex formed may not represent a fully assembled commitment, or A complex. In particular, the polypyrimidine tract may be too irregular for efficient binding of U2AF. Interestingly, both the TAT4 RNA and the 5Ј-splice site region alone bind U1 and U2 snRNP, whereas a construct, containing the 5Ј-splice site region of PIP7.A and the 3Ј-splice site of TAT4 (PIPTAT), contains only U1 snRNP. An explanation for the U2 snRNP binding to 5Ј-half of the TAT4 transcript, independently of the downstream 3Ј-splice site, could be that at least three functional 3Ј-splice sites, used in the expression of the Rev and Env genes, are present upstream of the 5Ј-splice site in the transcript. Even though none of these 3Ј-splice sites exhibit optimal polypyrimidine stretches or branch site sequences, it is possible that the binding of U2AF and U2 snRNP to these sites is stabilized through interactions with the U1 snRNP across the exon, as it has been suggested for other exons (44). Alternatively, the U2 snRNP binding may be a result of a increased unspecific association with the HIV-1 mRNA, as observed previously with other mRNAs (42,43,45).
Supposing that similar complexes are formed in vivo, what function could this early arrest in spliceosome assembly play in the context of Rev regulation? While the nascent transcript is being synthesized, heterogeneous nuclear RNPs, snRNPs, and auxiliary splicing factors will interact with the RNA. The RRE within the HIV-1 transcript probably also binds Rev protein at this stage. A logical reason for an inefficient second step of spliceosome assembly may be to accumulate high levels of nuclear unspliced HIV-1 mRNA partially assembled into spliceosomes, which are capable of interacting with Rev. Interestingly, several other lines of evidence suggest that Rev functions on mRNA in the context of U1 snRNP. By a genetic approach it has been shown that the base pairing between U1 snRNA and conserved sequences at the 5Ј-splice site is important for Rev responsiveness (13). A similar result was recently obtained in a yeast system demonstrating that some spliceosome assembly steps are required before Rev can act on the transcript (15). These observations are consistent with results obtained in vitro, which have demonstrated that Rev can inhibit splicing of an RRE containing mRNA that has been preincubated in nuclear extract in the absence of ATP. In contrast, Rev completely looses inhibitory activity if the spliceosome is preassembled in the presence of ATP prior to addition of Rev (32). These observations, and the data presented in this report, strongly suggests that Rev acts on the transcript after U1 snRNP binding but before assembly of a complete spliceosome. Maintenance of intrinsically inefficient splicing signals, such as a nonoptimal branch site, may therefore play a key role in timing Rev function.