Identification of the Cellular Targets of the Transcription Factor TCERG1 Reveals a Prevalent Role in mRNA Processing*

The transcription factor TCERG1 (also known as CA150) associates with RNA polymerase II holoenzyme and alters the elongation efficiency of reporter transcripts. TCERG1 is also found as a component of highly purified spliceosomes and has been implicated in splicing. To elucidate the function of TCERG1, we used short interfering RNA-mediated knockdown followed by en masse gene expression analysis to identify its cellular targets. Analysis of data from HEK293 and HeLa cells identified high confidence targets of TCERG1. We found that targets of TCERG1 were enriched in microRNA-binding sites, suggesting the possibility of post-transcriptional regulation. Consistently, reverse transcription-PCR analysis revealed that many of the changes observed upon TCERG1 knockdown were because of differences in alternative mRNA processing of the 3′-untranslated regions. Furthermore, a novel computational approach, which can identify alternatively processed events from conventional microarray data, showed that TCERG1 led to widespread alterations in mRNA processing. These findings provide the strongest support to date for a role of TCERG1 in mRNA processing and are consistent with proposals that TCERG1 couples transcription and processing.

TCERG1, which was previously known as co-activator of 150 kDa (CA150), was originally identified as a component of an active cellular fraction that supported Tat-activated transcription from the human immunodeficiency virus-long terminal repeat (1,2). Subsequent cloning and characterization determined that TCERG1 is composed of multiple protein domains, most notable of which are three WW domains in the N-terminal half and six FF repeats in the C terminus (1). Immunodepletion of TCERG1 from HeLa nuclear extract results in the loss of Tat transactivation of the human immunodeficiency virus-long terminal repeat, with little effect on basal transcription (1). Overexpression of TCERG1 in cell culture represses expression from human immunodeficiency virus-long terminal repeat and ␣4 integrin reporter constructs by inhibition of transcription elongation (3). Inhibition of these minimal reporter constructs is promoter-specific and TATA box-dependent (3). Consistent with a role in elongation, TCERG1 is found associated with elongation factors, Tat-SF1 and P-TEFb (4). TCERG1 is also present in a complex with RNA polymerase II (RNAPII) 3 holoenzyme, and via the FF domains TCERG1 preferentially associates with the hyper-phosphorylated form (II0) (1,5). This experimental evidence demonstrates a tight and functional association of TCERG1 with elongation-competent RNAPII.
Accumulating evidence also implicates TCERG1 in the process of RNA splicing. The WW domain 2 (WW2) of TCERG1 interacts with the splicing factors, SF1, U2AF, and components of the SF3 complex (6,7). TCERG1 has been identified in highly purified spliceosomes in multiple studies (8 -10) and was recently identified as a substrate of CARM1, an arginine methyltransferase whose activity is known to affect alternative splicing (11). Overexpression studies demonstrate that TCERG1 can affect splicing of ␤-globin and ␤-tropomyosin minimal splicing reporters (7).
The processes of transcription and splicing are known to be coordinated by the CTD of RNAPII. In addition to binding TCERG1, the CTD is known to interact with factors involved in capping, splicing, and polyadenylation (12)(13)(14)(15)(16). The CTD is widely accepted as the critical site for the assembly of the machinery responsible for transcription-coupled mRNA processing, and it is required for the efficient splicing, polyadenylation, and termination of transcription in vivo (13,17). The modular structure of TCERG1, with splicing factor-associating WW domains present in the N terminus and CTD-associating FF repeats in the C terminus, offers the ideal structure for a * This work was supported in part by National Institutes of Health Grant 1RO1 GM071037 (to M. A. G. B.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. □ S The on-line version of this article (available at http://www.jbc.org) contains supplemental Tables 1-6. 1 Supported by the Medical Scientist Training Program at Duke University. 2 To whom correspondence should be addressed. protein involved in coupling transcription and splicing. Consistent with this model, both halves of TCERG1 have been shown to be critical for the assembly of higher order transcription-splicing complexes (4). Fittingly, the Chironomus tentans TCERG1 homolog (hrp130) accumulates at the intron-rich Balbiani ring 3 gene (18). Attempts to elucidate the function of TCERG1 have been limited to biochemical analysis and transient overexpression studies utilizing artificial transcription and splicing reporters (1,3,6,7,19,20). An important gap in our knowledge is the identity of TCERG1-responsive cellular genes. This study combines RNAi-mediated knockdown and microarray analysis to identify cellular targets of TCERG1. By utilizing data from two independent cell types, we have identified high confidence targets of TCERG1. Among these targets, we identified transcripts whose splicing decisions were dependent on TCERG1, and by utilizing a bioinformatics approach we provide evidence that TCERG1 impacts the processing of many cellular mRNAs.
Cell Culture-HEK293T cells were maintained in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum and antibiotics. HEK293T cells were transfected with pcDNA6-EGFP, and a pool of stable transfectants was selected with blasticidin to derive HEK293T-EGFP. HeLa-R19-LUC cells have been described previously (21).
RNA Isolation and Microarray Hybridization-For knockdown experiments, total RNA was isolated from HEK293T-EGFP and HeLa-LUC cells using the RNeasy kit (Qiagen) and assessed for quality with an Agilent Lab-on-a-Chip 2100 Bioanalyzer. All probes for hybridization were then prepared according to standard Affymetrix protocols on the human U133A or human U133A_2 GeneChip arrays and scanned at a target intensity of 500 (Expression Analysis).
GSEA Analysis-Microarray data were normalized using RMA (RMA Express) before import for use by GSEA. GSEA is implemented by the software package GSEA-P from the Broad Institute (23) and is available on line. GSEA-derived statistics were generated using 1000 permutations of gene tags.
Bioinformatic Analysis of Gene Expression Data-A program, SplicerAV, was written in Perl to analyze standard RMAnormalized Affymetrix microarray data for evidence of alternative splicing. The inputs used to calculate the evidence of alternative processing, or Odds Score, used the log 2 fold change and signal-to-noise ratios from each individual probe set derived from the expression data sets. The signal-to-noise ratio was calculated as the difference of the means of two data sets divided by the sum of their standard deviations. A gaussian mixture model was implemented to calculate the maximum likelihood that these probe set log fold changes (weighted by square root of the signal-to-noise ratio) for a given gene were generated by a single gaussian distribution or by two gaussian distributions. In this way the maximum likelihood of a single regulation event is compared with the maximum likelihood of two separate regulation events, in this case interpreted as changes in alternative processing. To avoid overfitting, gaussians were not allowed to have a standard deviation of less than a 0.4 log 2 fold change, which is ϳ28% change in expression levels. The maximum likelihood ratio of the data being described by 1 versus 2 gaussians is referred to as the Odds Score. This Odds Score can then be used to rank the genes in order of descending Odds Scores, creating a list of the most likely targets of alternative processing. All single probe set genes were excluded from analyses using this program. Other caveats include that a dead or inactive probe set within a gene with other functional probe sets would generate a high Odds Score, because it could appear that part of the gene is being up-regulated whereas the other is not. In addition, data sets with genome-wide stronger signals (i.e. higher probe set log fold change) will tend to generate higher Odds Scores. Others (25,26) have previously used single probe set level data instead of multiple probe sets as a means of detecting alternative splicing; however, such algorithms may not have detected any of the alternative processing events presented in this paper, all of which spanned multiple probe sets. For a detailed discussion of probe set discrepancies in Affymetrix microarrays, see Stalteri and Harrison (43). A list of top targets as predicted by the program is included as supplemental tables.
Normalized Comparison of Mock Versus EGFP and Mock Versus TCERG1 Knockdowns B and C-To compare two lists of different probe set log fold change distributions, sub-distributions (subL) were first generated from each original distribution (L), which were matched for the maximum absolute value of each gene's log fold change. Starting with the highest maximum absolute value of the control master list (L), genes were alternately drawn from each original distribution, L (i.e. Mock versus EGFP and Mock versus TCERG1-B), and added to that subdistribution, subL (i.e. subMock versus EGFP or subMock versus TCERG1-B), each time drawing the gene with the next lower absolute log fold change. In this way two subLs, one from each original distribution, were drawn that could be directly compared without confounding by differences in overall log fold change magnitudes.
Statistical Analysis of Odds Scores-A Kolmogorov-Smirnov (KS) test was performed on the top 100 genes to examine the probability that these genes came from the same distribution (two-sided KS test) or if one distribution was greater than another (one-sided KS test). This analysis was performed for the maximum absolute value corrected sub-distributions.
Statistical and Experimental Validation of SplicerAV-The original RMA normalized microarray intensity values from the TCERG1(Ϫ) 293 (n ϭ 6) experimental condition were each compared with the average of the TCERG1(ϩ) 293 (n ϭ 6) control condition to determine 6-fold change values for each probe set. The probe sets within a gene were then grouped using the groupings predicted by SPLICERAV (A or B in the output shown in supplemental Table 5). All normalized fold change values for each probe set within A or B were assembled into two new groups. A Welch's t test was performed on these two new groups to calculate the probability that the observed fold changes were the same. This probability was then corrected using the Bonferroni correction, given that N probe sets within a gene can be grouped a total of 2N Ϫ1 Ϫ 1 possible ways. Low p values indicate that the two groups of probe sets as predicted by SPLICERAV do not behave the same. This could happen because of alternative processing, poor probe set annotation, or bad probe sets. RT-PCR validation was performed under semi-quantitative conditions using radionucleotide incorporation. Products were resolved by 6% PAGE. Quantification was performed by exposure to phosphorimaging screen and analyzed by ImageQuant (GE Healthcare). PCR primer sequences will be made available upon request.

RESULTS
Identification of TCERG1 Targets in HEK293T Cells-We set out to identify cellular targets of TCERG1 using a combination of siRNA-mediated knockdown and en masse gene expression analysis. To this end, HEK293T cells stably expressing EGFP were used for siRNA-mediated knockdown of TCERG1, allowing the heterologously expressed EGFP to be targeted by siRNA as a negative control. Mock-transfected cells were an additional negative control and were considered as base-line expression. Three independent siRNA duplexes specific for TCERG1 were transfected at a final concentration of 10 nM, and all significantly lowered TCERG1 levels, with TCERG1-B and TCERG1-C giving the best knockdown. In Mock-treated cells and in those transfected with EGFP siRNA, TCERG1 levels did not change (Fig. 1A, left panel). The EGFP siRNA was fully functional as demonstrated by fluorescence-activated cell sorter analysis, which confirmed reduced EGFP levels after 72 h (Fig. 1A, right panel). This experiment was repeated three times with similar results. Total RNA from the controls, Mock and siEGFP, and the two siRNAs with the best knockdown, TCERG1-B and -C, were used for subsequent global mRNA quantification. We chose the 72-h time point, because TCERG1 levels had been significantly depleted for at least 24 h.
The analysis was carried out separately to derive the Down gene set (genes whose level decreased upon TCERG1 knockdown) and the Up gene set (genes whose level increased upon TCERG1 knockdown). To derive the Down gene set, we compared the Mock and EGFP conditions and excluded from the analysis any genes that decreased 1.2-fold or greater in the EGFP condition (Table 1, see Footnote a). From the remaining genes, potential targets were identified as those genes that decreased 1.2-fold or greater when condition Mock was compared with both condition TCERG1-B and condition TCERG1-C. To derive the Up gene set, we utilized the same process, varying only in the direction of the change (Table 1, see Footnote b). These criteria were set to cast a wide net based more on reproducibility and less on fold change. It should be noted that the 1.7-fold reduction in TCERG1 transcript, as reflected in the microarrays, resulted in an average 2.75 Ϯ 0.75fold reduction in protein levels as determined by semi-quantitative Western blot of the three experiments (data not shown).
A more stringent criteria were used to identify probe sets that increased or decreased Ն1.5-fold, and all of the examples described below (see Fig. 3) fell into this more stringent list of targets.
Utilizing two independent TCERG1-specific siRNA duplexes, and defining targets as those genes that change commonly between them, allowed us to minimize false positives because of siRNA-specific off target effects. The EGFP knockdown served as an additional filter to remove genes that change merely as a result of an activated siRNA response.
The analysis described above and summarized in Table 1 resulted in the identification of 554 probe sets, representing 487 unique genes, that decreased and 485 probe sets, representing 432 unique genes, that increased upon TCERG1 depletion (supplemental Tables 1 and 2).

Utilizing TCERG1 Knockdown in HeLa Cells as Validation of Cellular
Targets of TCERG1-In our quest to identify genuine targets of TCERG1, we performed TCERG1 knockdown experiment utilizing HeLa cells stably expressing firefly luciferase, HeLa-Luc, which have a different origin than HEK293T cells. In addition to changing cell lines, the experiments in HeLa cells utilized the TCERG1 siRNA duplex, TCERG1-A, which was not used in the HEK293T analysis (Fig. 1B). We reasoned that targets identified in both HEK293T and HeLa cells using different siRNAs could be considered bona fide TCERG1 targets.
To identify TCERG1 targets shared by HEK293T and HeLa cells, we used Gene Set Enrichment Analysis (GSEA) (23,28). GSEA is useful when comparing a defined gene set to the rank order of another microarray experiment. The utility of GSEA hinges on the ability to quantify and visualize the distribution of the defined gene set within the data of another microarray comparison. By relying on the distribution, GSEA dispenses with the issues of varying fold change between cell types. Specifically, the objective of the software is to determine whether genes in a set S occur more frequently at the top or bottom of a list L. The program provides an enrichment score based on a weighted Kolmogorov-Smirnov statistic (23) and also defines the leading edge subset of S, which is interpreted as the core subset of S responsible for the enrichment score. In our case, set S was either the Up-gene set (S Up ) or the Down-gene set (S Dn ) in HEK293T cells following TCERG1 knockdown (Table 1), and the rank order list L would be a continuous ranking of all probe sets correlated to the level of TCERG1 in HeLa cells. Before performing this comparison between cell lines, we decided to carry out a test of internal consistency by analyzing the HEK293T data using GSEA parameters. As required by the method we created the following two conditions: TCERG1(ϩ) 293 (n ϭ 6) was derived from the control conditions, Mock (n ϭ 3) and EGFP (n ϭ 3 ) , and TCERG1(Ϫ) 293 (n ϭ 6) was derived from the knockdown conditions TCERG1-B (n ϭ 3) and TCERG1-C (n ϭ 3). These two conditions were used to construct the rank order list, L 293 ϭ TCERG1(ϩ) 293 versus TCERG1(Ϫ) 293 . As expected the S Up was enriched in condition TCERG1(Ϫ) 293 ( Fig. 2A, left panel), and the S Dn was enriched in condition TCERG1(ϩ) 293 (right panel). This exercise gave us confidence that the GSEA could be applied to compare the results from HeLa and HEK293T cells.
We then applied GSEA to the HEK293T-HeLa comparison, keeping S ϭ S Up or S Dn (from HEK293T cells). To create a rank list L HeLa we carried out the following experiment. HeLa cells were transfected with TCERG1-A siRNA specific for TCERG1, or Luc siRNA, which targets the luciferase transcript, using a two-hit protocol (see "Experimental Procedures"). At 48 h and 72 h following the second hit, total RNA and protein were harvested. This experiment was done twice, and both times TCERG1 protein levels were significantly reduced at both 48 and 72 h (Fig. 1B). The RNA samples, derived from the two independent experiments, were subjected to quantification using Affymetrix HU-133A GeneChip arrays, and the data were used to create the new rank order list L HeLa ϭ TCERG1(ϩ) HeLa versus TCERG1(Ϫ) HeLa . Condition TCERG1(ϩ) HeLa (n ϭ 4) combined the 48-and 72-h luciferase knockdowns from the two experiments, whereas condition TCERG1(Ϫ) HeLa (n ϭ 4) combined the 48-and 72-h TCERG1 knockdowns. The top of the list represents those probe sets that were positively correlated with the first condition TCERG1(ϩ) HeLa ; these were the probe sets that go down upon HeLa TCERG1 knockdown (Fig.  2B). The bottom of the list represents probe sets that were negatively correlated with TCERG1(ϩ) HeLa ; these were the probe sets that go up upon HeLa TCERG1 knockdown (Fig. 2B). When we applied GSEA to L HeLa using S Up , the 485 Up-gene set demonstrated enrichment in condition TCERG1(Ϫ) HeLa with a leading edge subset of 131 probe sets (Fig. 2B, left panel). When GSEA was applied to S Dn , the 554 Down-gene set demonstrated significant enrichment in condition TCERG1(ϩ) HeLa (p ϭ 0.05; FDR ϭ 0.1) with a leading edge subset of 264 probe sets contributing to the core enrichment (Fig. 2B, right panel). Heat maps displaying the correlation of the 50 most enriched of S for each output are shown to the right of each of panel in Fig. 2. These 131 probes sets, representing 123 gene targets, up-regulated upon TCERG1 depletion (i.e. require TCERG1 for decreased expression), and 264 probe sets, representing 226 down-regulated gene targets (i.e. require TCERG1 for increased expression) are defined here as the "highest confi-dence" targets of TCERG1, and we refer to these as belonging to our target list (Table 2 and supplemental Tables 3 and 4). TCERG1 Depletion Results in Changes in mRNA Processing-Whereas in some cases (e.g. RBM3, which was down-regulated by 2.1-fold) we noted changes in overall level of transcripts, we also noticed several instances where multiple probe sets assaying the same gene did not behave consistently. In the case of EMS1(CTTN), which was the most up-regulated TCERG1-responsive target in HEK293T cells and was present among the 131-member highest confidence list defined

TCERG1 Alters the Processing of Cellular mRNA
by GSEA, there are four probe sets. Although three probe sets, which queried exonic sequences did not respond appreciably to TCERG1 knockdown, the probe set that identified EMS1(CTTN) as the most affected (4.8-fold up-regulated) by TCERG1 knockdown was found to query sequences within intron 4. As shown in Fig. 3, RT-PCR amplification of CTTN mRNA using oligo(dT) priming for the RT step and PCR primers designed to sequences in exon 2 and intronic sequences downstream of exon 4 resulted in production of a product that increased upon TCERG1 knockdown. This product was sequenced and identified as a CTTN transcript with retained intron 4 sequences. The product of amplification from exon 2 to exon 4 of CTTN did not change upon TCERG1 knockdown, demonstrating the specificity of the effect of TCERG1 on one isoform of CTTN mRNA (Fig. 3).
BUB3 is interrogated by four Affymetrix probe sets; however, of these only two changed upon TCERG1 knockdown in HEK293T cells, with one of these (down-regulated by 1.7-fold) passing through the HeLa GSEA filter. Careful examination of the BUB3 sequences revealed that the two probe sets most affected by TCERG1 knockdown interrogated sequences present only when a particular 3Ј splice site is utilized. Alternate 3Ј splice site utilization would result in a decrease in the signal from these probe sets upon TCERG1 knockdown. Indeed, amplification of BUB3 transcripts with primers designed to visualize this event revealed a change in 3Ј splice site usage upon TCERG1 knockdown in HEK293T cells (Fig. 3). These data suggested that many changes in mRNA levels of TCERG1 targets, as reported by Affymetrix microarray analysis, could represent changes in RNA processing. depletion on the splicing of the fibronectin EDI exon. Although the EDI exon is not interrogated directly by the microarray experiments described above, splicing for this exon has been shown previously to be sensitive to alterations in transcription elongation (29,30). Skipping of this exon is stimulated by high elongation rates. Depletion of TCERG1 by siRNA treatment of Hep3B cells transfected with reporter minigenes provoked an increase in EDI inclusion independently of the promoter used (cytomegalovirus or mFN) (Fig. 4). These data with a well characterized alternative splicing reporter provided additional confirmation of the effects of TCERG1 depletion on alternative processing.

TCERG1 Knockdown Results in Prevalent Changes in mRNA
Processing-The Affymetrix H133A series of GeneChip arrays have 4,642 genes with two or more probe sets. The presence of multiple probe sets provides the possibility to observe isoform-specific changes. To this end, we developed a program, SplicerAV, to predict genes with a high likelihood of alternative processing by analyzing the behavior of their probe sets using a phenotypecorrelated expression data set.
SplicerAV determined if the log fold changes for the group of probe sets for a given gene varied in their distribution (see "Experimental Procedures" for determination of log fold change and signal-to-noise ratio). In other words, SplicerAV determined whether the probe sets distribute into one or two groups. If the log fold changes for all probes sets for a given gene distributed in one group, then we concluded that there was no change in processing detected by these probe sets. If, however, the distribution of the log fold changes for all probe sets for a given gene was best described by two groups, we suspected an alternative processing event. To identify and rank the genes suspected of alternative processing, we generated an Odds Score. This was done using the log fold change in expression for each probe set weighted by a function of its signal-to-noise ratio. The Odds Score was defined as the ratio of the likelihood that the probes sets were described by two events versus the likelihood that the probe sets were described by one event. The lowest possible Odds Score for a gene was 1, which indicated that all probe sets for a given gene behaved identically and provided no evidence of alternative processing. An Odds Score Ͼ1 indicated some discrepancy in the behavior of the probe sets, which could be caused by an alternative processing event. The greater the value of the Odds Score the higher that gene ranked in the list of possible alternative processing candidates.
Comparison of HEK293T knockdown TCERG1(ϩ) 293 versus TCERG1(Ϫ) 293 was used to generate and rank Odds Scores for the 4,642 genes on the array with two or more probes. CTTN FIGURE 3. TCERG1 affects alternative mRNA processing. cDNA generated by oligo(dT)-primed reverse transcription of total RNA from Mock, EGFP, TCERG1-B, or TCERG1-C samples from HEK293T-EGFP experiments (Exp) 1-3 was PCR-amplified using gene-specific primers. RBM3 was amplified from exon 5 to exon 7. CTTN message, "CTTN-retained intron," was amplified using a forward primer in exon 2 and a reverse primer in intron 4. CTTN was amplified using the same exon 2 primer and a reverse primer in exon 4. BUB3 was amplified using a forward primer in exon 7 and a reverse primer in exon 8, which resulted in two products that differ in exon 8 3Ј splice site (ss) choice. ␤-Actin was amplified as a control. and BUB3, which we had shown are alternatively processed in response to CA150 depletion, were ranked first and second on the list (supplemental Table 5), providing validation that SplicerAV could identify genes that were alternatively processed from Affymetrix gene-based microarray data.
We examined our top 12 predictions using two approaches, statistical (generation of p values) and experimental (semiquantitative RT-PCR), and the results are summarized in Table  3. The statistical approach derived a p value for the predicted probe set distributions using the microarray expression values (see "Experimental Procedures"). Ten of the top 12 predictions had p values Ͻ0.01 demonstrating the robust nature of the program (Table 3). Of these top 10 significant predictions, 8 generated readily testable hypotheses. In addition to CTTN and BUB3, three additional genes among these eight were experimentally shown to undergo the alternative processing predicted. ACACA (2.3-fold up-regulated) demonstrated alternative exon inclusion, and PPP3CB (1.6-fold down-regulated) and SYNCRIP (1.5-fold up-regulated) changes could be explained by alternate polyadenylation sites (Fig. 5). Of the three remaining genes, MTCP1 was unamenable to RT-PCR, whereas ASAH1 and APPBP2 did not appear to be alternatively processed. The predicted alternative processing of RABGGTB, which had a probe set that was down-regulated by 1.6-fold and was ranked number 43 by SplicerAV, was also validated. The change in RABGGTB expression upon TCERG1 knockdown could be best explained by alternative polyadenylation site usage (Fig. 5). Two of the top 10 significant predictions did not generate a testable hypothesis; MAP2K5 probe set behavior was unintelligible, and one of two RBM3 probe sets was poorly annotated and not specific for any curated RBM3 transcript.
We also used SplicerAV to ask whether or not the effects of TCERG1 knockdown were widespread. If this were true, knockdown would result in a significant change in the number of genes predicted to have a high Odds Scores. We compared the distribution of Odds Scores as follows: Mock (n ϭ 3) versus EGFP (n ϭ 3), Mock (n ϭ 3) versus TCERG1-B (n ϭ 3), and Mock (n ϭ 3) versus TCERG1-C (n ϭ 3). We visualized these distributions using a Kaplan-Meier plot (survival plot), and both TCERG1 knockdown conditions resulted in a greater number of genes displaying high Odds Score when compared with the control EGFP knockdown (Fig. 6A). To control for the correlation between log fold change and Odds Score, we generated maximum log fold change matched sub-distributions, referred to as subLs ( Fig. 6B and under "Experimental Procedures"). To do this, all 4,642 genes from both the original Mock versus EGFP and the original Mock versus TCERG1-B or -C were ranked by the absolute maximum log fold change of each gene. Each of these master lists L were methodically scanned for genes with similar maximum log fold changes. These similar genes were drawn from each master list L to generate a subL. These subLs were therefore closely matched by maximum absolute log fold change (Probe Score) for the pair of master distributions being examined (Fig. 6B). Survival plots of the Odds Scores were generated from these matched pairs of subLs: Mock versus EGFP and Mock versus TCERG1-B (Fig. 6C); and Mock versus EGFP and Mock versus TCERG1-C (Fig. 6D). In each of these two comparisons the top 100 odds scoring genes from each condition were compared using a KS test. Mock versus TCERG1-B generated a significantly higher Odds Scores compared with that of Mock versus EGFP (one-sided KS test, p ϭ 1.39 ϫ 10 Ϫ11 ). In the second comparison Mock versus TCERG1-C also demonstrated significantly higher Odds Scores compared with that of Mock versus EGFP (one-sided KS test, p ϭ 3.15 ϫ 10 Ϫ3 ). When Mock versus TCERG1-B and Mock versus TCERG1-C were plotted against each other, we observed no significant difference in Odds Scores (two sided KS test, p ϭ 0.37) (data not shown).
This analysis demonstrated that TCERG1 knockdown resulted in a higher Odds Score when compared with EGFP knockdown, and we interpret these data as evidence for a prevalent involvement of TCERG1 in alternative processing of cellular mRNAs.
GSEA Analysis Identifies miRNA-binding Site Enrichment in Target Genes-Using GSEA, we sought to determine whether genes affected by TCERG1 levels shared any commonality that could shed additional light on TCERG1 function. Although this study has used GSEA to query one gene set at a time, GSEA was designed to query a file of many gene sets at once. The Broad Institute has made available a motifs gene set file (c3.v2.symbols.gmt) that includes 780 gene sets that contain between 15 and 500 members, each sharing a common sequence motif. Each phenotype of the correlated data set, L 293 ϭ TCERG1(ϩ) 293 versus TCERG1(Ϫ) 293 , was assessed for enrichment of any of these 780 motifs gene sets. The TCERG1(ϩ) 293 phenotype did not display significant enrichment for any motifs gene set; however, the TCERG1(Ϫ) 293 phenotype displayed enrichment of 33 gene sets with an FDR Ͻ25% and p values of Ͻ0.01 (Table 4). Of these 33 gene sets, 21 (64%) were those defined as containing genes with a predicted mir-RNA-binding site.
An independent computational approach, List to List Comparison (L2L) (31), also demonstrated significant miRNA target enrichment in the 485 up set although showing none in the larger 554 down set (supplemental Table 6). These data demonstrate that among genes down-regulated by TCERG1, there is a significant enrichment of genes predicted to bind and presumably be regulated by microRNAs. These data suggest that TCERG1 may regulate mRNA levels via a mechanism involving miRNAs. This may be true of the set of genes where TCERG1 alters processing of alternative 3Ј-UTRs.

DISCUSSION
TCERG1 was discovered in 1997, and despite extensive biochemical and functional characterization, its role in vivo has remained elusive. As a means to ascertain the function of TCERG1, we sought to identify the cellular genes that are responsive to alterations in TCERG1 protein levels. Our strategy, which combined RNAi-mediated knockdown in both HEK293T and HeLa cells followed by microarray analysis, resulted in a list of "high confidence" cellular targets of TCERG1 and demonstrated a functional link between TCERG1 and splicing in vivo. This study demonstrates that decreases in TCERG1 protein levels can both up-regulate and down-regulate expression of cellular gene products. Although this functional analysis unambiguously identifies gene products that depend on TCERG1, it does not discriminate between several potential mechanisms. The low overall fold changes observed for the targets (average 1.4-fold) suggest that TCERG1 may act through a mechanism not easily reported by microarrays designed for transcriptome-based studies. It is possible that TCERG1 interacts with the nascent transcript (or ribonucleoprotein) and directly alters splicing decisions. This could be consistent with independent effects on transcription elongation and alternative processing. Alternatively, TCERG1 could work at the interface of RNAP II and the splicing machinery, exerting an effect on processing that is functionally coupled to effects on transcription. It is also possible that TCERG1 only affects transcription directly and that all of the processing effects are the consequence of altered transcription. Finally, TCERG1 could control other regulators that could then alter several of the targets.
TCERG1 depletion results in an increase in the levels of predicted targets of microRNAs (Table 4 and supplemental Table  6). It is possible that TCERG1 is directly involved in the expres-  Fig. 4. The remaining validated gene targets are shown above, with A-C being from the top 12 and D being 43rd. Each gene target is shown as a schematic with the predicted alternative processing hypotheses, which was generated by combining SplicerAV output with the genomic alignment of the interrogated probe sets. Arrows indicate a greater than 20% change in probe set expression. Below the predicted behavior is a schematic of the primers used for experimental RT-PCR validation, along with the predicted products. To the right are quantifications of both the predicted hypotheses and the experimental RT-PCR. The quantifiable predictions were made by averaging the expression of the probe sets which interrogated regions corresponding to the predicted product. Both the microarray data and RT-PCR data were obtained using the TCERG1(ϩ) 293 (n ϭ 6) versus TCERG1(Ϫ) 293 (n ϭ 6) experimental conditions.

TCERG1 Alters the Processing of Cellular mRNA
sion of miRNAs, and upon depletion of TCERG1 there is decreased expression of miRNAs resulting in an increase in target mRNA. Alternatively, TCERG1 could regulate miRNA targets by altering the availability of the target sites. This would be accomplished by alternative mRNA processing leading to different 3Ј-UTRs. In fact given the bias of the A133 microarrays, which interrogate the 3Ј ends of transcripts preferentially, we suggest that the CA150 targets identified here will be enriched in those with alternative 3Ј-UTRs. It is also possible that a target of TCERG1 could be responsible for the enrichment via an indirect mechanism. In fact, RBM3, most downregulated gene upon TCERG1 knockdown in HEK293T cells, has been shown to affect cellular miRNA levels (32). Although the mechanism remains to be elucidated, our observations suggest that TCERG1 levels can markedly affect miRNA targets. These genes are then sorted by descending order of this MLFC to create a master distribution for each treatment (e.g. Mock versus EGFP). A sub-distribution, subL, of each master distribution is then created. This is done using an initial MLFC cutoff equal to 1. Starting with the Mock versus EGFP list, the first gene that has an MLFC below 1 is added to the subL being generated from EGFP (subL Mock versus EGFP). The MLFC of this first gene is then set as the new lower cutoff for the next gene to be drawn. This lower cutoff will then be used to select the next lower MLFC gene from the Mock versus TCERG1-C distribution to be added to the subL being generated from Mock versus TCERG1-C (subL Mock versus TCERG1-C). In this way genes are draw alternatively from either distribution, selecting a lower MLFC each time. In this way two subLs are generated, which are matched for maximum log fold changes. Dots within the original distributions indicate multiple genes in a row and are not shown for the sake of space and indicate that the original Mock versus TCERG1-C distribution has overall higher maximum log fold changes compared with the Mock versus EGFP distribution. C, survival plot of Odds Scores for subL EGFP and subL TCERG1-B. D, survival plot of Odds Scores for subL EGFP and subL TCERG1-C. MARCH 21, 2008 • VOLUME 283 • NUMBER 12 RBM3 is also involved in regulation of translation in neuronal cells (33) and is down-regulated by polyglutamine expression (34). RBM3 overexpression significantly protected cells from polyglutamine-induced toxicity, suggesting a role in Huntington disease (HD) pathology (34). Interestingly, TCERG1 has been suggested as a genetic modifier of HD (35)(36)(37) and has been shown to be protective in models of HD neurotoxicity (38). The ability of TCERG1 to affect alternative processing of cellular mRNA, and specifically the expression of RBM3, suggests a mechanism whereby TCERG1 could influence HD progression.

TCERG1 Alters the Processing of Cellular mRNA
Accumulating evidence suggests a role of TCERG1 in the coupling of transcription to splicing. TCERG1 fulfills a number of criteria required of such a factor. TCERG1 interacts with the CTD of RNAPII and preferentially binds a phosphorylated CTD (5). TCERG1 overexpression affects elongation in a promoter-specific fashion (3). Changes in promoter context and elongation rate of transcription are known to affect splicing decisions (39). Reciprocally, addition of splice sites to a transcribed sequence has also been shown to affect transcription (40,41). TCERG1 has been defined as a spliceosome component in multiple studies (7)(8)(9)42). Immunolocalization on Polytene chromosomes demonstrates a marked accumulation of the C. tentans TCERG1 homolog (hrp130) at the intron-rich Balbiani ring 3, an area of active transcription and remarkably high intron density (18). The authors postulated that hrp130 was recruited to modulate elongation to facilitate splicing (18). The work reported here provides the strongest evidence yet that TCERG1 is involved in splicing of cellular mRNAs.
Although the gene-specific Affymetrix H133 series of microarrays are not touted as having the potential to report isoform-specific changes in mRNA, we have demonstrated the utility of careful analysis of these data. SplicerAV allowed the demonstration that TCERG1 levels can have prevalent effects on the levels of specific mRNA isoforms. Although limited by the number of probe sets that can report these differences, conventional Affymetrix GeneChip arrays are the predominant microarray platform used by the scientific community for comparative expression studies, and archived data derived from these studies are voluminous. SplicerAV has broad application for the reanalysis of this wealth of available microarray data for potential alternative processing.