Genome-wide Analysis of Alternative Pre-mRNA Splicing*

Alternative splicing of mRNA precursors allows the synthesis of multiple mRNAs from a single primary transcript, significantly expanding the information content and regulatory possibilities of higher eukaryotic genomes. High-throughput enabling technologies, particularly large-scale sequencing and splicing-sensitive microarrays, are providing unprecedented opportunities to address key questions in this field. The picture emerging from these pioneering studies is that alternative splicing affects most human genes and a significant fraction of the genes in other multicellular organisms, with the potential to greatly influence the evolution of complex genomes. A combinatorial code of regulatory signals and factors can deploy physiologically coherent programs of alternative splicing that are distinct from those regulated at other steps of gene expression. Pre-mRNA splicing and its regulation play important roles in human pathologies, and genome-wide analyses in this area are paving the way for improved diagnostic tools and for the identification of novel and more specific pharmaceutical targets.

Alternative splicing of mRNA precursors allows the synthesis of multiple mRNAs from a single primary transcript, significantly expanding the information content and regulatory possibilities of higher eukaryotic genomes. High-throughput enabling technologies, particularly large-scale sequencing and splicing-sensitive microarrays, are providing unprecedented opportunities to address key questions in this field. The picture emerging from these pioneering studies is that alternative splicing affects most human genes and a significant fraction of the genes in other multicellular organisms, with the potential to greatly influence the evolution of complex genomes. A combinatorial code of regulatory signals and factors can deploy physiologically coherent programs of alternative splicing that are distinct from those regulated at other steps of gene expression. Pre-mRNA splicing and its regulation play important roles in human pathologies, and genome-wide analyses in this area are paving the way for improved diagnostic tools and for the identification of novel and more specific pharmaceutical targets.
Removal of introns from pre-mRNAs is an essential step in eukaryotic gene expression (see Fig. 1A). Alternative patterns of intron removal allow the synthesis of multiple mRNAs from a single gene encoding different proteins (see Fig. 1B). This minireview focuses on how the recent application of high-throughput technologies is providing a novel and empowered perspective to address the following four outstanding questions in the field of alternative pre-mRNA splicing (AS). 3

Prevalence
Although early estimates suggested that AS affects only a small fraction of human genes, large-scale genome and transcriptome sequencing projects allowing extensive alignments of mRNA with genomic sequences (see Fig. 2A) indicate that the majority of human genes are alternatively spliced (1). Splicing-sensitive microarrays (see Fig. 2B) provide independent confirmation of this high incidence. Using a variety of experimental designs and biological samples, several of these studies have produced consistent estimates of 70 -80% of alternatively spliced genes in the human genome (2)(3)(4). The production of alternatively spliced transcripts is therefore a general feature of human genes to be incorporated in biological and medical studies, from the design of gene knock-out/down experiments to molecular diagnostics or drug screens.
In addition, in-depth analysis of individual genes frequently reveals novel AS isoforms, suggesting that transcript diversity is far from fully annotated in current data bases. Furthermore, recent analyses of HapMap cell lines document variations in AS among different individuals, an observation with significant basic and medical implications (5,6).
Is the prevalence of AS uniform across organisms, or does it increase with the evolution of complexity? In silico comparative analyses have provided contradictory answers to this question. Although some data support an increased incidence of AS in vertebrates (7), it is also clear that mechanisms of AS regulation have contributed to the expansion of genomic information during long evolutionary periods. For example, array experiments suggest that 40% of the Drosophila genes show changes in AS during embryonic development (8), and genes with extraordinarily complex patterns of regulation exist in invertebrates. (A now classical example is the Drosophila Dscam gene, which can generate over 38,000 isoforms important for neural wiring and immune defense (9).)

Biological Relevance
It is unquestionable that AS can generate mRNAs with important and distinct biological functions (10 -13). The more difficult question is what fraction of the extensive transcript variation generated by AS is truly biologically relevant and what fraction may be due to stochastic noise in the splicing process.
Biological Incidence-Consistent with the idea that AS plays important roles in cellular function, bioinformatic and array data indicate that the process is more prominent in tissues with diverse cell types and among genes playing regulatory functions (4,12,14). An important insight from global studies in a variety of biological situations is that the overlap between genes that show changes in AS and those regulated through changes in transcript levels is relatively limited (but also see Ref. 18). This observation suggests the existence of dedicated regulatory programs that coordinately control multiple AS events. Such programs are evident, for example, in Drosophila sex determination (19) and mammalian synaptic transmission (20,21) and strongly argue for the biological relevance of AS.
Predicted Functional Effects-Bioinformatic analyses indicate that 75% of AS events affect coding regions, with predicted effects ranging from subtle amino acid substitutions to removal of protein motifs or protein truncations (10 -12). Although compelling instances of biologically relevant changes due to a single amino acid difference exist (13,22), predicting the functional impact of the frequent small variations in protein structure generated by AS is often difficult. For instance, single amino acid insertions due to the use of alternative NAGNAG 3Ј-splice sites (which occur in up to 30% of human genes) can affect function or be the result of stochastic choice (23). Stochasticity could nevertheless offer an evolutionary testing ground to explore novel functions, a concept also applicable to other classes of AS events.
Mapping of AS regions onto solved polypeptide structures indicates that most AS events affect coiled or loop regions often located on the protein surface (24,25). Such location could either reflect an impact on functional interactions with other factors or simply reflect which regions of the protein are more likely to tolerate amino acid changes. Although compelling examples of domain swapping by AS exist (9,22,26), bioinformatic analysis of ENCODE genes argues against a large-scale expansion of the spectrum of protein domains mediated by AS (27).
Another frequent consequence of AS, affecting 35% of mammalian genes, is the generation of mRNA variants containing premature termination codons, which are degraded by the process of nonsense-mediated mRNA decay (11,12). However, a genome-wide study found that most premature termination codon-containing splice variants are produced at low levels independently of the function of the nonsense-mediated mRNA decay pathway and are rarely tissue-specific or phylogenetically conserved (16). Nonetheless, recent reports highlight the remarkable evolutionary conservation and importance of this mechanism in the control of expression of splicing regulators (28 -32).
AS events occurring in untranslated regions of mRNAs can also lead to functional differences, for example, to expose (or not) the transcripts to factors that regulate their stability or translation. Thus, 40% of predicted microRNA targets are located in regions of 3Ј-untranslated regions that are subject to AS and/or alternative polyadenylation (33).
Evolutionary Conservation-Conserved AS events are more likely to be functionally important because they show a higher tendency to preserve the reading frame, to modify protein-coding sequences, and to conserve regulatory sequences and be tissue-specific (12). However, computational predictions indicate that only 10 -20% of AS events appear to be conserved between human and mouse, a figure substantially lower than the overall gene conservation between these organisms (12,34). Although this figure may increase as more extensive data sets become available, does it argue against the functional relevance of the majority of AS events? An alternative view is that AS can play important roles in the evolution of individual genes. A compelling example is the exonization of transposable elements present in intronic regions of some vertebrate genes, which evolved mechanisms of limited exon inclusion that facilitate exploration of novel functions without compromising expression of the host gene (35). There is substantial evidence for the creation or loss of tens of thousands of exons across vertebrate genomes, which can function as evolutionary "hot spots" (36).
In summary, neither simple structure/function predictions nor extensive phylogenetic comparisons provide straightforward evidence of biological relevance for the majority of AS events. However, evidence accumulated through classical geneby-gene studies and persuasive examples of coordinated regulation of biological processes through AS readily justify future systematic functional analyses of AS isoforms.

Regulatory Mechanisms
As elaborated in other minireviews in this series (37)(38)(39), splice site selection depends not only on the identity of the individual splice sites but also on numerous regulatory sequence motifs in the neighboring exons and introns, which are recognized by proteins of the SR protein and hnRNP families as well as other splicing factors that associate with the RNA as it emerges from transcriptional complexes (Fig. 1B) (10 -12, 40).
Bioinformatics-Splicing regulatory sequences have been identified through a variety of approaches (reviewed in Ref. 40), including (i) comparisons between exons in genes and in pseudogenes or intronless genes, (ii) phylogenetic nucleotide conservation not explained by conservation of amino acid sequence, and (iii) correlations between sequence motifs and the strength of the neighboring splice sites. Additional regulatory sequences were identified as characteristic signatures of tissue-specific splicing or of activation of specific signaling pathways (10,12,40). When added to motifs identified as preferred binding sites for known regulatory factors, the picture coming from these studies is that a substantial fraction of exonic and flanking intronic sequences, both constitutively and alternatively spliced, play roles in modulating the splicing process. Furthermore, the extent and nature of these effects some- times depend on their position relative to splicing signals (41) or to other regulatory motifs (42).
Combinatorial assemblies of regulatory motifs and factors act on individual transcripts to facilitate or preclude splice site recognition by the spliceosome and thus establish patterns of AS. Cell type-specific splicing can be achieved either by expression of cell type-specific regulators or through cell type-specific variations in the levels or activity of more ubiquitous factors (10 -12).
Splicing-sensitive Microarrays-Various platforms for highthroughput detection of splice variants and changes in AS are contributing to decipher these cellular codes ( Fig. 2B and supplemental Table 1). First, results from these approaches have increased substantially the number of known tissue-specific AS events (4,(51)(52)(53) or of events changing under specific conditions like T cell activation (17). They have even revealed an unexpected flexibility in the splicing machinery of the yeast Saccharomyces cerevisiae (previously thought to be mostly devoid of regulated splicing) to adapt to stress conditions (43).
Splicing microarrays have also been used to identify AS events regulated upon ablation or overexpression of specific factors (20, 44 -48), significantly expanding the known targets for tissue-specific (20,45) or ubiquitous (45)(46)(47) regulators. In addition, results from yeast arrays and systematic RNA interference screens in Drosophila revealed substrate-specific splicing defects and changes in AS caused by depletion of core components of the spliceosome, uncovering the unexpected regulatory potential of these factors (44,48,49).
Genome-wide studies have helped to evaluate the generality of mechanistic insights gained from gene-by-gene studies. For example, previous work indicated that SR protein splicing factors have redundant biochemical activities and that some members of the SR protein and hnRNP families display general antagonistic activities in AS regulation (10). Results from splicing microarrays did show changes in AS consistent with these conclusions, but also provided extensive evidence for specific functions of individual splicing regulators and for a complex network of genetic interactions between regulators and AS events (45). For example, the targets of a soma-specific splicing regulator (PSI) were included within a subset of the targets of a ubiquitous hnRNP protein (Hrp48), suggesting that PSI requires Hrp48 as a cofactor, whereas Hrp48 has additional, PSI-independent functions.
Finally, important cellular regulatory circuits have been uncovered using splicing microarrays. These include, for instance, changes in AS relevant for cell transformation induced by the SR protein SF2/ASF (47) (see below) and regulation of a neuron-specific form of PTB (nPTB) by the more ubiquitous isoform of this splicing regulator, a feedback mechanism that controls an extensive alternative splicing program during neural development (46).
Combining Technologies-The combined use of experimental and computational approaches has been very fruitful for identifying regulatory sequences (50,51), including motifs involved in tissue-specific AS (52,53). Interestingly, although muscle-specific sequence motifs were largely overlapping with FIGURE 2. Examples of high-throughput methods used for the analysis of AS. A, transcript/genomic alignments. Genomic sequence information can be compared with mRNA sequences deposited in cDNA or expressed sequence tag (EST) libraries. Sequences present in all available mRNA sequences are assumed to correspond to constitutive exons, whereas sequences skipped in particular mRNA clones are indicative of AS. B, splicing-sensitive microarrays. Oligonucleotide probes complementary to exonic or splice junction sequences (colored short lines) are immobilized on the solid surface of the microarray. RNAs from the samples to be compared are labeled with different fluorescent dyes (green and red stars). Hybridization of the RNA samples to the array probes results in signals whose color and intensity depend upon the relative abundance of the sequence feature interrogated by each probe. (Yellow indicates equal intensity from each dye.) The combined information from the different probes monitoring an AS event allows an evaluation of the relative abundance between mRNA isoforms in the samples. Several other experimental designs and analysis methods have been developed (2-4, 14, 18, 20, 44 -46, 48, 61, 65, 69, 71). known binding sites for muscle-specific regulators (52), numerous potential novel regulatory sequences for nervous systemspecific splicing were found (53), consistent with the high prevalence of AS in this tissue (4,14).
Combinations of various high-throughput, biochemical, and computational methods have allowed the identification of numerous target genes of the Nova neuron-specific splicing regulators and also the drawing of a regulatory map that correctly predicts the effect of these factors on AS regulation (20,21,54). High-throughput technologies include ultraviolet cross-linking and immunoprecipitation of RNAs bound to Nova in mouse brain, which identified Ͼ30 novel RNAs from genes involved in synapse inhibition (54). Splicing-sensitive microarrays identified 40 neuron-specific exons from a largely distinct set of synapse-related genes that showed splicing defects in the neocortex of Nova-2 knock-out mice (20). Interestingly, a significant fraction of the protein products of these genes were known to form complexes, suggesting that AS modulates the function of these complexes to shape the synapse.

Impact on Disease
As illustrated by the examples below, a mounting body of evidence implicates splicing defects and altered splicing regulation as causes or modifiers of numerous pathologies. Systematic screening for mRNA defects indicates that 50% of the mutations identified in patients with neurofibromatosis or ataxia-telangiectasia affect splicing of the NF1 or ATM genes, respectively (55). Noncoding nucleotide expansion disorders affect the function of splicing factors: CTG expansions characteristic of myotonic dystrophy patients, transcribed into RNA, alter the activity of CUG-binding proteins, resulting in changes in AS of other genes that can explain some associated symptoms (56). Overexpression of the SR protein SF2/ASF leads to efficient cell transformation and tumor formation by altering the ratios between isoforms of key regulators of cell growth (47).
High-throughput technologies for detecting AS hold the promise of improved diagnostic and prognostic tools (57). Both computational predictions and microarray experiments have identified hundreds of AS and aberrant splicing events associated with disease states, particularly various cancers (18, 58 -67). These add to a growing list of AS changes and aberrant splicing events known to affect cellular features very relevant for tumor growth, including cell transformation, motility, invasiveness, and angiogenesis (reviewed in Ref. 68).
Consistent with the notion that changes in AS can be relevant for cellular phenotypes, correlations between transcript features and lymphoma grade, proper classification of histologically distinct glial brain tumors, and improved diagnosis of prostate cancer based upon differential expression of exons have been reported using splicing microarrays (63,66,67). Other medically relevant insights provided by these studies include a potential autocrine mechanism for the development of choriocarcinomas, ectopic functional expression of neuronspecific splicing regulators in lymphomas, and the influence of growth substrate on the expression profile of breast cancer cell lines (61,63,69).
Detailed knowledge about isoform expression provides the possibility to identify novel, more specific, and safer targets for drug design (57). For example, voltage-independent inhibition of N-type calcium channels in the pain pathway depends critically on an alternatively spliced exon that makes these channels more sensitive to neurotransmitters and drugs (22). In this regard, individual variation in splicing patterns related to population haplotypes (5,6) may add yet another dimension to personalized medicine.

Future
Likely additions to the arsenal of high-throughput technologies to study AS include proteomic technologies able to distinguish and quantify protein isoforms. The potential of two-dimensional difference gel electrophoresis, which compares protein samples labeled with different fluorescent dyes, has recently been illustrated by the discovery of cross-regulation and functional redundancy between paralogs of the splicing regulator PTB (31). High-throughput comparative mass spectrometry, e.g. through differential isotope labeling methods, holds the promise to quantitatively measure protein variants.
Spectacular advances in sequencing technology output (ultrasequencing) are likely to represent a major breakthrough for biological research. The possibility to directly and unambiguously measure the abundance of individual sequences is bound to have a deep impact on global analyses of AS. Additional pending issues in demand of technical developments include the simultaneous detection of combinations of different splicing events occurring in individual transcripts, which requires single molecule detection techniques (70,71), and the identification of novel AS events, which can be helped by the construction of cDNA libraries enriched in alternatively spliced transcripts (72).
Technologies enabling genome-wide analyses are likely to become standard tools for addressing virtually every functional, mechanistic, medical, or evolutionary question in gene function and AS. The possibility to combine various technologies (e.g. cross-linking and immunoprecipitation with microarray analyses), particularly merging large-scale data gathering with bioinformatic methods, will be of great value for modeling biological impact and regulatory mechanisms and networks (21).