Upstream elements present in the 3'-untranslated region of collagen genes influence the processing efficiency of overlapping polyadenylation signals.

3'-Untranslated regions (UTRs) of genes often contain key regulatory elements involved in gene expression control. A high degree of evolutionary conservation in regions of the 3'-UTR suggests important, conserved elements. In particular, we are interested in those elements involved in regulation of 3' end formation. In addition to canonical sequence elements, auxiliary sequences likely play an important role in determining the polyadenylation efficiency of mammalian pre-mRNAs. We identified highly conserved sequence elements upstream of the AAUAAA in three human collagen genes, COL1A1, COL1A2, and COL2A1, and demonstrate that these upstream sequence elements (USEs) influence polyadenylation efficiency. Mutation of the USEs decreases polyadenylation efficiency both in vitro and in vivo, and inclusion of competitor oligoribonucleotides representing the USEs specifically inhibit polyadenylation. We have also shown that insertion of a USE into a weak polyadenylation signal can enhance 3' end formation. Close inspection of the COL1A2 3'-UTR reveals an unusual feature of two closely spaced, competing polyadenylation signals. Taken together, these data demonstrate that USEs are important auxiliary polyadenylation elements in mammalian genes.

Poly(A) tails are found on the 3Ј end of nearly every fully processed eukaryotic mRNA. The poly(A) tail has been suggested to influence mRNA stability, translation, and transport (for review, see Refs. [1][2][3][4]. Polyadenylation is a two-step process that first involves specific endonucleolytic cleavage at a site determined by binding of polyadenylation factors (for review, see Refs. [5][6][7][8][9][10]. The second step involves polymerization of an adenosine tail to an average length of ϳ200 residues. These steps are tightly coupled processes since reaction intermediates are not detectable under normal conditions. The vast majority of eukaryotic polyadenylation signals contain the consensus sequence AAUAAA between 10 and 35 nucleotides upstream of the actual cleavage and polyadenylation site. In addition, sequences 10 -30 nucleotides downstream of the cleavage site are known to be involved in direct-ing polyadenylation (Refs. 11-13 and references therein). These downstream elements (DSEs) 1 can be characterized as a block containing 4 of 5 uracil (U) residues. These two sequence elements recruit cleavage and polyadenylation specificity factor (CPSF) and cleavage stimulatory factor (CstF), respectively, to define the cleavage site; therefore, mutations within these sequences abolish polyadenylation.
The intricate nature of this process implies that polyadenylation might be a useful mechanism to regulate gene expression. The efficiency of 3Ј end processing is a level at which regulation can occur. Because most pre-mRNAs in the cell are not efficiently processed, even small changes in the overall processing efficiency of a particular pre-mRNA may have a substantial effect on gene expression. Experimental evidence has demonstrated that poly(A) signal strength directly influences the amount of mature, exported mRNA (14,15). Poly(A) signal strength is also directly correlated with the rapid assembly of the polyadenylation machinery on the nascent transcript (16) and with transcription termination efficiency (17). Detailed mechanistic studies on regulation of polyadenylation are now emerging, revealing both cis-and trans-acting factors (for review, see Refs. 5, 9, and 18).
In addition to the sequence elements previously described, elements upstream of the AAUAAA sequence and downstream of the DSE (see Refs. 19 and 20 and references therein) have been identified as auxiliary cis-acting polyadenylation efficiency elements. Such elements may play an important role in modulating the overall processing efficiency. Upstream elements (USEs) have been characterized in viral and cellular systems, including SV40 (21), human immunodeficiency virus (22)(23)(24)(25)(26)(27), adenovirus major late region (28,29), cauliflower mosaic virus (30), ground squirrel hepatitis virus (31,32), and human C2 complement (33,34). Spacing between the AAUAAA and the USEs plays a significant role in that the USEs closest to the AAUAAA are most important (21). Studies on human immunodeficiency virus suggest that definition of the polyadenylation site involves the recognition of multiple sequence elements, including the USE, in the context of the AAUAAA (27).
Comparisons between the polyadenylation signals of the SV40 late mRNA and other cellular mRNAs revealed that three human collagen genes, COL1A1, COL1A2, and COL2A1, each possess elements similar in sequence (USE consensus UAU 2-5 GUNA) and position relative to the AAUAAA to the SV40 late USEs (21)). 2 Collagens are extracellular proteins that are responsible for the strength and flexibility of connective tissue. They account for 25-30% of all proteins present in animals and are the major fibrous element of skin, bone, tendon, cartilage, blood vessels, and teeth (see Ref. 35 and references therein). In addition to their structural role, collagens have a directive role in tissue development. The basic structural unit of collagen consists of three polypeptide chains that are extensively covalently cross-linked to each other. The composition of the chains depends on the type of collagen. Type I collagen consists of two COL1A1 chains and one COL1A2 chain, whereas type II collagen consists of three COL2A1 chains.
Interestingly, COL1A1, COL1A2, and COL2A1 each possess 3Ј-UTRs that are extremely conserved between human and other vertebrates, including mice, cows, chickens, and puffer fish (Refs. 36 -43, see also GenBank TM and Table I). For example, human COL1A1 has a 3Ј-UTR ϳ1.5 kilobases in length with 2 polyadenylation signals. The first ϳ500 bases, containing the first polyadenylation signal, are 86.5% identical in mouse, followed by a ϳ600-base block of little conservation, and the final ϳ400 bases, including the second polyadenylation signal, are 71.1% identical (44). This high degree of evolutionary conservation suggests important regulatory functions of the collagen 3Ј-UTRs. It is important to note that the use of one polyadenylation signal over another will shorten the 3Ј-UTR, and this shortening could potentially remove important regulatory elements. When the sequences surrounding the polyadenylation signal are carefully examined, the percent identity is even higher, especially in the case of the final polyadenylation signal (see Table I). Mechanisms for alternative polyadenylation have been extensively studied for the calcitonin and IgM genes (45)(46)(47); however, it is not mechanistically understood how one collagen polyadenylation signal is chosen from among several.
Up-regulation of collagen gene expression takes place in a variety of diseases, including osteoarthritis and scleroderma, but it is unclear how this regulation of expression is accomplished (48 -51). Some studies suggest in scleroderma that collagen production may be up-regulated by increased mRNA transcription (47,52), but the altered expression may not be fully explainable by changes in transcription rates and may additionally be accomplished by regulated post-transcriptional mechanisms, such as polyadenylation.
This study examines the regulation of 3Ј end formation in human collagen genes. The strong evolutionary conservation of the 3Ј-UTRs, particularly around polyadenylation signals, led us to believe that these regions contained key regulatory elements. We asked whether cis-acting USEs function as auxiliary elements to influence polyadenylation of the collagen mRNAs, whether these elements acted in a similar fashion to other defined USEs, and how these elements affect utilization of overlapping polyadenylation signals. We determined that the USEs present in these collagen genes do influence 3Ј end formation efficiency in these genes. Furthermore, the organization of alternative poly(A) signals in the COL1A2 gene suggests that assembly of 3Ј end processing factors on the distal signal prohibits assembly of the processing complex on the proximal signal. This suggests that protein-RNA interactions between core polyadenylation signal elements may represent a novel method of down-regulating polyadenylation signal usage.

MATERIALS AND METHODS
In Vitro Transcription of RNA Substrates-RNA transcripts for in vitro polyadenylation and cleavage reactions were synthesized by use of SP6 RNA polymerase according to the supplier (Promega) in the presence of 50 Ci of [ 32 P]UTP (Amersham Biosciences or PerkinElmer Life Sciences). Transcription of COL1A2 yielded a 311-base RNA, transcription of COL2A1 yielded a 323-base RNA, and transcription of COL1A1 yielded a 274-base RNA. RNAs were gel-purified from 5% polyacrylamide, 8 M urea gels by overnight crush elution in high salt buffer (0.4 M NaCl, 50 mM Tris at pH 8.0, 0.1% SDS) before use in reactions. Eluted RNAs were ethanol-precipitated and resuspended in water.

Gene
Size of 3Ј-UTR No. polyadenylation signals % Sequence identity to final PA signal COL1A1 ϳ1.5 kb 2 92% cow, 98% macaque, 92% mouse COL1A2 ϳ900 bases 5 or 6 79% puffer fish, 84% chicken COL2A1 ϳ400 bases 2 95% dog, 86% cow, 89% rat 2.6% polyvinyl alcohol, and 1 ϫ 10 5 cpm of 32 P-labeled substrate RNA in a reaction volume of 12.5 l (56). Cleavage reactions were allowed to incubate at 30°C for 1 h. Products were then processed and analyzed as described for polyadenylation products. Competition reactions were performed by adding increased concentrations of specific or nonspecific oligoribonucleotides as indicated into the typical polyadenylation reaction mixtures and were allowed to proceed as described above. Reactions were quantitated using a Molecular Dynamics PhosphorImager and ImageQuant software.
Oligonucleotides-Oligonucleotides were synthesized on the Applied Biosystems 392 and 394 DNA/RNA synthesizers in the New Jersey Medical School Molecular Resource Facility (Newark, NJ). A list of the primers used in PCR reactions and cloning is found in Table II. An oligoribonucleotide representing the putative USE motifs of COL1A2 had the sequence AUUAAAUUGUACCUAUUUUG. A nonspecific oligoribonucleotide was also synthesized and had the sequence GUCACGUGUCACC.
Transfection and RNase Protection-Human 293T and HeLa cells were maintained in Dulbecco's modified Eagle's medium (Invitrogen) supplemented with 10% fetal bovine serum (Sigma) and 1% penicillinstreptomycin (Invitrogen). Cells were seeded in 100-mm plates ϳ12 h before transfection. When cells reached 80% confluency, they were transfected using the Quantum Prep Cytofectene reagent (Bio-Rad). Plasmid DNA (8.4 g) was diluted in 700 l of serum-free medium to which 40 l of Cytofectene was added, and the mixture was incubated at room temperature for 20 min. After the addition of 6.3 ml of medium (plus fetal bovine serum) to the mixture, the medium on the cells was removed and replaced with the entire transfection mixture. The Cytofectene-DNA mixture was removed 6 h after transfection and replaced with fresh medium. After 24 h, cells were washed once with phosphatebuffered saline. Cells were scraped and collected into 1 ml of phosphatebuffered saline and transferred into microcentrifuge tubes. Cells were then centrifuged at 1000 rpm for 5 min. The phosphate-buffered saline was aspirated, and pellets were stored at Ϫ80°C for no more than 2 days.
Total RNA was extracted from the cell pellet using the RNeasy mini kit (Qiagen) according to the manufacturer's spin protocol for isolation of total RNA from animal cells. Probe RNA was prepared as described above using T7 polymerase and generating the antisense of the pC␤S-COL1A2 construct. The reporter RNA levels were determined by RNase protection using the RPAIII kit (Ambion Inc., Austin, TX) for 1 h at 37°C. DNA templates were then removed by DNase I digestion, and the RNA was phenol-extracted, ethanol-precipitated, and analyzed on 5% polyacrylamide 8 M urea gels as described above.
Plasmids-COL1A2, COL2A1, and COL1A1 inserts were generated   (Table II). The COL2A1 and COL1A1 primers contained BamHI (forward) and HindIII (reverse) recognition sites to allow insertion into appropriately digested vectors, whereas the COL1A2 primers contained BamHI (forward) and PstI (reverse) recognition sites. Gel-purified and appropriately digested PCR fragments were ligated into appropriately digested pGEM4 with T4 DNA ligase (Invitrogen) at 17°C overnight. The constructs were then transformed into Escherichia coli XL1-Blue cells. Positive clones were both sequenced and assayed for expression of appropriately sized clones. Sequencing was completed by the New Jersey Medical School Molecular Resource Facility using the Applied Biosystems 373 DNA Sequencer, and the resulting sequences were analyzed by BLAST computer programs for accuracy. Amplification reactions were performed using platinum Taq polymerase (Invitrogen) in a total volume of 50 l using standard reaction mixtures for 35 cycles of 95°C (1 min), 55°C (30 s), and 72°C (45 s) with an initial denaturation step of 95°C (5 min). PCR products were purified on a 1% agarose gel before use. Primers used in individual PCR reactions are referred to in Table II. The COL1A2 double mutant 1,2 and the triple mutant 1,2,3 were prepared via the Stratagene QuikChange kit. Amplification reactions using Pfu polymerase were for 20 cycles of 95°C (30 s), 55°C (1 min), and 68°C (8 min) with a 95°C (5 min) initial denaturation step. To remove the remaining template after amplification, a DpnI digestion was performed for 90 min at 37°C. The PCR product was then transformed into E. coli XL-1 Blue cells. Positive clones were isolated and sequenced to determine correct expression before use in polyadenylation reactions.
The PA2-G mutant COL1A2, the single USE mutants, and the double mutant 1,3 were created by the use of the megaprimer method of PCR mutagenesis as described previously (57). Briefly, a mutant PCR product was generated using an internal upstream primer containing the mutant, such as the PA2-G mutant COL1A2, and the wild-type COL1A2 downstream primer. This product containing the mutant sequence was then gel-purified and used as the downstream megaprimer for the second PCR. The COL1A2 wild type upstream primer was used for the second PCR. A third PCR was then performed using the gelpurified second PCR product as the template and COL1A2 wild type upstream and downstream primers. This final PCR amplified the inefficient product resulting from the second PCR. This final product was gel-purified, cloned into pGEM4, and transformed into E. coli XL-1 Blue cells.
The construction of pIVA 2 was described in Wilusz et al. (58). Briefly, the 155-base BamHI to PvuII fragment of pAd5-E1B (which contains adenovirus type 5 sequences from 3943 to 4122) was cloned into pGem4 at the HincII and BamHI sites. Linearization with BglI yields a 158base RNA containing the IVA 2 poly(A) signal. The DNA template to generate IVA 2 -USE RNA, which contains a USE 5Ј of the AAUAAA, was constructed by a two-step PCR reaction using the megaprimer approach (57). The first PCR reaction used a standard SP6 primer and 5Ј-TATTTAGGGGTTTTGCGGGTTACAAATAAAGCCGCGCGGTAGG-CCGG to generate a megaprimer that contained a USE insertion 13 bases upstream of the AAUAAA element. The second PCR reaction contained the megaprimer and the primer 5Ј-AGCTTGCATGCCTG-CAGGTCGACTC. The product of this PCR reaction was then cut with BglII and used as a template to generate IVA 2 -USE RNA using SP6 RNA polymerase.
For transfection assays, all COL1A2 constructs were cloned into the BamHI-PstI sites of vector pC␤S (a gift of David Fritz, UMDNJ) downstream of a cytomegalovirus promoter and upstream of a bovine growth hormone polyadenylation (BGH) signal. This vector also includes intron 1 of the rabbit ␤-globin gene accompanied by the splice donor and acceptor sites. Constructs were verified by sequencing.

USEs Can Stimulate in Vitro 3Ј End Processing of a Weak
Polyadenylation Signal-Previously, auxiliary polyadenylation elements known as upstream efficiency elements (USEs) have been described and characterized in the SV40 late polyadenylation signal (21). It was determined that these elements functioned as efficiency elements in the SV40 system since their disruption resulted in reduced polyadenylation function (21). It has also been shown that a USE from the SV40 signal can replace the human immunodeficiency virus USE in mediating efficient 3Ј end formation in transient transfection assays (23). Both SV40 and human immunodeficiency virus have very strong polyadenylation signals. However, it has not previously been determined if USEs could be added to a weak polyadenylation signal, such as the adenovirus IVA 2 polyadenylation signal, to enhance 3Ј processing efficiency. A USE motif from SV40 having the sequence GCUUUAUUUGUAACC was inserted upstream of the AAUAAA in the IVA 2 polyadenylation signal to create IVA 2 -USE, substrate RNAs were prepared from both pIVA 2 and pIVA 2 -USE, and the RNAs were added to in vitro polyadenylation reactions. Fig. 1 shows that the presence of a USE in IVA 2 -USE enhanced polyadenylation efficiency ϳ4-fold as compared with IVA 2 alone. These data indicate that insertion of a USE can increase polyadenylation efficiency of a weak processing signal, suggest that USEs can modulate poly(A) site definition, and suggest that USEs may be commonly found in cellular, not only viral, genes. It is important, therefore, to evaluate mammalian polyadenylation signals for USE elements.
USEs Can Be Found in Collagen Genes-A survey of numerous cellular polyadenylation signals revealed that elements resembling the USE motifs present in SV40 can also be found in many 3Ј-UTRs; that is, similar to the consensus UAU 2-5 GUNA and within 75 bases of the AAUAAA. 2 We chose to focus on three collagen genes since their 3Ј-UTRs are highly conserved through evolution, suggesting regulatory function. Fig. 2A shows the comparison of the SV40 late polyadenylation signal to three human collagen genes for type I (COL1A1 and COL1A2) and type II (COL2A1) collagens. The polyadenylation signals AAUAAA/AUUAAA and the putative USE elements are underlined. We next wanted to determine whether these putative USEs present in the collagen 3Ј-UTRs could stimulate 3Ј end processing efficiency like the SV40 USEs. Previously it was shown that polyadenylation reactions containing an SV40 substrate RNA could be inhibited specifically by oligoribonucleotides representing the USE motifs (54). Plasmids encoding a portion of each collagen gene 3Ј-UTR containing the polyadenylation signals were created by PCR of human genomic DNA. Substrate RNAs for the polyadenylation reactions were prepared by in vitro transcription using SP6 polymerase in the presence of [ 32 P]UTP. In vitro coupled cleavage and polyadenylation reactions were performed using HeLa nuclear extract and COL1A2 (Fig. 2B) substrate RNA. Oligoribonucleotides representing a putative USE corresponding to COL1A2 or a nonspecific oligoribonucleotide were also added to the reactions. The results for COL1A2 are shown as an autoradiogram of a typical in vitro polyadenylation reaction. The second lane in Fig. 2B, marked 0, represents a reaction performed in the absence of competitor oligoribonucleotides and demonstrates that the COL1A2 substrate RNA was efficiently polyadeny-lated in our in vitro system. Similar results were found with COL1A1 and COL2A1 substrate RNAs and their specific oligoribonucleotides (data not shown). In each case, the specific USE oligoribonucleotide inhibited polyadenylation, whereas the nonspecific had no effect on polyadenylation (see Fig. 2B and data not shown). No effect was also noted when a different nonspecific oligoribonucleotide was used for each substrate RNA (data not shown). Quantitation of the percent polyadenylated product is indicated below the autoradiogram of the gel in Fig. 2B. Additionally, oligoribonucleotides representing the collagen USEs can cross-compete in this assay (i.e. a COL1A1 oligo can compete with a COL1A2 substrate RNA; data not shown). Taken together, these data suggest that the oligoribonucleotides specifically bind and sequester a common factor(s) important for polyadenylation and suggest that the similarity to the SV40 motifs identified in the collagen 3Ј-UTR is functionally significant.
COL1A2 USEs Act as Auxiliary Polyadenylation Elements in Vitro-Because of the strong processing efficiency observed FIG. 4. USEs influence in vivo polyadenylation efficiency of the COL1A2 signal. A, representative RNase protection assay using HeLa cells. Bands marked as open circles represent those fragments protected when the BGH polyadenylation signal was used and, therefore, represent polyadenylation at that signal; bands marked as asterisks represent those fragments protected when the COL1A2 polyadenylation signal was used and, therefore, represent polyadenylation at that signal. Because of the mutations created, these fragments were often different in size and are diagrammed for ease of interpretation on the right side of the figure. Lane 1, marker, pBR322 cut with MspI and 5Ј end-labeled with using the COL1A2 substrate RNA, we chose to focus our attention on that polyadenylation signal (diagrammed in Fig. 3A). We were also intrigued by the high degree of sequence conservation of this signal from diverse organisms (see Fig. 3A). We next made a series of substitutions replacing the COL1A2 USEs with BglII linkers to assess the contribution of the USEs to in vitro polyadenylation (diagrammed in Fig. 3B). USEs were replaced individually, as well as two at a time and three at a time. A nonspecific mutation was created by introducing a BglII linker in a non-USE-containing region upstream of the polyadenylation signal. RNAs were prepared from each construct by in vitro transcription in the presence of [ 32 P]UTP, were gel-purified, and were added to in vitro polyadenylation reactions using HeLa nuclear extract. Reaction products were analyzed on 5% polyacrylamide, 8 M urea gels. The data were quantitated from multiple in vitro reactions and are presented in Fig. 3. Mutation of either USE 1, 2, or 3 alone had little effect on polyadenylation. Mutation of both USEs 1 and 3 simultaneously also had only a slight effect, but co-mutation of USEs 1 and 2 or USEs 1, 2, and 3 diminished polyadenylation efficiency to approximately half of wild type levels. A nonspecific mutation outside the USE region (NS mut) had no effect on polyadenylation. We conclude that none of the USEs is absolutely required for COL1A2 polyadenylation but that mutation of USE 2 in conjunction with at least one other USE led to the most dramatic decreases in polyadenylation.

COL1A2 USE Mutations Are More Deleterious in in Vivo
Assays-Because all of our experiments so far have been performed in vitro, we found it important to verify our results in in vivo assays. We next cloned our COL1A2 wild type and mutant constructs into pC␤S downstream of a cytomegalovirus promotor and upstream of a BGH poly(A) signal. Tandem polyadenylation signals have been used previously to examine requirements for a different type of auxiliary sequences in the lamin B2 gene (59). A T7 promoter on the opposite strand downstream of the BGH poly(A) signal was also present for ease in making antisense probes. The constructs were then transfected into HeLa or 293T cells, and after 24 h, total RNA was harvested. This RNA was added to RNase protection assays (using an antisense transcript from the T7 promoter as a probe). Representative RNase protection assays are shown in Fig. 4A, and the results of all our experiments were quantitated and are shown in Fig. 4B. The results show that use of the COL1A2 polyadenylation signal prevailed over use of the BGH polyadenylation site when the COL1A2 signal was wild type (lane 2) or had a nonspecific mutation (lane 7), but the USE mutations altered this ratio (lanes 4 -6 and 8 -10). The quantitated results were analyzed as the ratio of the protected RNA fragment corresponding to RNA polyadenylated at the COL1A2 site relative to those polyadenylated at the BGH poly(A) site (Fig. 4B). A large number means that the COL1A2 polyadenylation signals were preferentially used rather than the BGH polyadenylation signal, whereas a small number means that the BGH polyadenylation signal was preferentially used. The overall trends in the in vivo data correlate with the in vitro data; however, the USE mutants are more deleterious in varying degrees in vivo as compared with in vitro. USE 1 and 3 mutations alone had little effect on use of the COL1A2 polyadenylation signal, whereas USE 2 mutation decreased polyadenylation efficiency to approximately half of wild type. Mutation of the USEs in duplicate or triplicate also reduced FIG. 5. COL1A2 has the unusual feature of two closely spaced, competing polyadenylation signals. A, in vitro cleavage assay reveals poly(A) signal 2 is the predominantly used polyadenylation signal, whereas poly(A) signal 1 is a minor signal. Mutation of poly(A) signal 2 from AAUAAA to AAGAAA switches this predominance. Marker lane, transcript prepared from COL1A2 construct that was linearized with NspI (cuts between the two cleavage sites). B, in vivo polyadenylation assays reveal that USE mutants plus mutation of poly(A) signal 2 results in a marked decrease in polyadenylation efficiency. Lighter gray bars, 293T cell transfections; darker gray bars, HeLa cell transfections. Percent polyadenylation (% PA) was measured as the ratio of COL1A2 polyadenylation site utilization to the downstream bovine growth hormone polyadenylation site utilization as quantitated by RNase protection assays.
FIG. 6. CPSF binding between the core elements of the proximal polyadenylation signal of COL1A2 may inhibit complex assembly on the distal polyadenylation signal. Polyadenylation machinery can successfully assemble on the proximal signal (poly(A) signal 2) but may not support assembly of processing factors on the distal signal (poly(A) signal 1) due to spacing or steric constraints. polyadenylation efficiency to approximately half of wild type.
COL1A2 Has Unusual, Overlapping, Competing Polyadenylation Signals-Close examination of the COL1A2 mRNA sequence revealed an unusual feature, that there are in fact two polyadenylation signals within 15 bases of each other (see Fig.  3A). Based upon the composition and spacing of the downstream CstF binding site (also known as the DSE) relative to the AAUAAA, it might seem that poly(A) signal 1 would be preferentially used instead of poly(A) signal 2. To formally investigate the question of which poly(A) signal was the major site of polyadenylation, we turned to cleavage assays using cordycepin, a non-hydrolyzable analog of ATP. It turns out that poly(A) signal 2 is the major site of polyadenylation, whereas poly(A) signal 1 is the minor site (Fig. 5A). When a non-usable mutant of poly(A) signal 2 was created (AAUAAA to AAGAAA; PA-2 G), polyadenylation now switched to poly(A) signal 1 (Fig.  5A). This suggested that perhaps something more than USEs and sequence spacing of the AAUAAA relative to the DSE influences poly(A) signal choice in this system.
We then wanted to know how mutation of the USEs in combination with the poly(A) signal 2 mutant (PA-2 G) affected 3Ј end processing. In our in vitro assays, mutation of the poly(A) signal 2 consensus hexamer from AAUAAA to AA-GAAA did not affect the overall polyadenylation efficiency (see Fig. 3B, PA-2 G mut). In our in vivo RNase protection assays, mutation of poly(A) site 2 had no effect on overall polyadenylation, but that mutation in conjunction with the double and triple USE mutations decreased polyadenylation to approximately one-fifth of wild type (see Fig. 4A, lanes 2-3 and 11-12, and Fig. 5B). Taken together, these data demonstrate that USEs influence 3Ј end formation efficiency in the COL1A2 gene.

DISCUSSION
In this study we have identified auxiliary 3Ј end processing elements in highly conserved regions of the 3Ј-UTRs of human collagen genes. These elements promote efficient polyadenylation in vitro and in vivo. In addition, COL1A2 has the unusual feature of overlapping polyadenylation signals, one of which predominates, and suggests a novel mechanism for poly(A) signal down-regulation (see below). These findings provide initial insight into regulation of collagen gene expression that will hopefully aid our understanding of disease initiation.
Human type I and type II collagen genes all have highly evolutionarily conserved 3Ј-UTRs. Indeed, the functional importance of the collagen 3Ј-UTRs is implied by their conservation. The 3Ј-UTRs likely contain important regulatory sequences that influence polyadenylation site utilization and may also ultimately influence the cytoplasmic fate of the mRNA. Recognition of two core cis-acting elements (the AAUAAA and the downstream U-rich element) by the polyadenylation factors CPSF and CstF is the key determinant of mRNA-processing efficiency. In the case of large 3Ј-UTRs and/or multiple polyadenylation signals, additional auxiliary elements may be necessary to ensure proper polyadenylation. The 3Ј-UTRs of these three collagen genes likely require such auxiliary motifs to support the efficient assembly of polyadenylation factors. Interestingly, within the evolutionarily conserved regions of these 3Ј-UTRs exist elements containing close homology with the USE auxiliary polyadenylation elements of SV40. We show here that these USEs in the collagen 3Ј-UTRs act as auxiliary polyadenylation efficiency elements and that these USEs play an important role in an overlapping polyadenylation signal.
Our in vivo data suggest that the USEs might be most important for poly(A) signal 1 utilization since mutation of the USEs affects polyadenylation at that site more than when both poly(A) signals are intact. Our in vitro data also support this, although the effects are not as dramatic (data not shown). As has been appreciated more completely in recent years, 3Ј end formation is interconnected both to the other major RNA processing events, splicing and capping, and also to mRNA transcription (for review, see Refs. 6, 9, and 60). This interconnection likely results in most efficient utilization of cis-and trans-acting signals. Thus, it is reasonable to expect that the in vivo data most closely mimic regulation at the cellular level and reflect the co-transcriptional nature of these processes.
The overlapping polyadenylation signals present in the COL1A2 3Ј-UTR are unusual. Our data demonstrate that poly(A) signal 2 is the major site of polyadenylation, whereas poly(A) signal 1 is used, but to a lesser extent (see Fig. 5A). These data suggest a model, shown in Fig. 6. The configuration of the overlapping signals sets up a competition between CstF binding to poly(A) signal 1 versus CPSF binding to poly(A) signal 2. Mutation of the AAUAAA in poly(A) signal 2 activates usage of poly(A) signal 1 (see Figs. 5A and 6). These data suggest that the two polyadenylation signals are in competition with each other. Steric hindrance may not allow the interaction between CPSF and CstF bound at the corresponding sites for poly(A) signals 1 and 2 simultaneously, or it may suggest that CPSF-RNA interactions are dominant over CstF interactions at the DSE for poly(A) signal 1. These data demonstrate a principle that protein-RNA interactions can interfere with scaffold assembly, suggesting a novel mechanism for repressing poly(A) signal usage. It remains to be seen whether this arrangement could be used to decrease polyadenylation efficiency at selected signals.