Structure and Evolution of the Alternatively Spliced Fast Troponin T Isoform Gene*

The vertebrate fast skeletal muscle troponin T gene, TnTf, produces a complexity of isoforms through differential mRNA splicing. The mechanisms that regulate splicing and the physiological significance of TnTf isoforms are poorly understood. To investigate these questions, we have determined the complete sequence structure of the quail TnTf gene, and we have characterized the developmental expression of alternatively spliced TnTf mRNAs in quail embryonic muscles. We report the following: 1) the quail TnTf gene is significantly larger than the rat TnTf gene and has 8 non-homologous exons, including a pectoral muscle-specific set of alternatively spliced exons; 2) specific sequences are implicated in regulated exon splicing; 3) a 900-base pair sequence element, composed primarily of intron sequence flanking the pectoral muscle-specific exons, is tandemly repeated 4 times and once partially, providing direct evidence that the pectoral-specific TnT exon domain arose by intragenic duplications; 4) a chicken repeat 1 retrotransposon element resides upstream of this repeated intronic/pectoral exon sequence domain and is implicated in transposition of this element into an ancestral genome; and 5) a large set of novel isoforms, produced by regulated exon splicing, is expressed in quail muscles, providing insights into the developmental regulation, physiological function, and evolution of the vertebrate TnTf isoforms.

The vertebrate fast skeletal muscle troponin T gene, TnTf, produces a complexity of isoforms through differential mRNA splicing. The mechanisms that regulate splicing and the physiological significance of TnTf isoforms are poorly understood. To investigate these questions, we have determined the complete sequence structure of the quail TnTf gene, and we have characterized the developmental expression of alternatively spliced TnTf mRNAs in quail embryonic muscles. We report the following: 1) the quail TnTf gene is significantly larger than the rat TnTf gene and has 8 non-homologous exons, including a pectoral muscle-specific set of alternatively spliced exons; 2) specific sequences are implicated in regulated exon splicing; 3) a 900-base pair sequence element, composed primarily of intron sequence flanking the pectoral muscle-specific exons, is tandemly repeated 4 times and once partially, providing direct evidence that the pectoral-specific TnT exon domain arose by intragenic duplications; 4) a chicken repeat 1 retrotransposon element resides upstream of this repeated intronic/pectoral exon sequence domain and is implicated in transposition of this element into an ancestral genome; and 5) a large set of novel isoforms, produced by regulated exon splicing, is expressed in quail muscles, providing insights into the developmental regulation, physiological function, and evolution of the vertebrate TnTf isoforms.
The troponin (Tn) 1 complex subunit proteins, troponin T (TnT), troponin I (TnI), and troponin C (TnC), interact with tropomyosin (Tm) and actin in the thin filament and are the Ca 2ϩ -sensitive switch for muscle contraction (for review see Refs. [1][2][3]. TnT, TnI, and TnC are each encoded by small gene families that encode functionally related isoforms. Vertebrate TnT, which is the focus of this report, is encoded by three genes that are differentially expressed in fast, slow, and heart muscles (2). Alternative mRNA splicing generates additional isoforms encoded by each of these three genes (4 -7). TnT isoforms resulting from alternative mRNA splicing are differentially expressed during development and in different muscle types, indicating that these forms have specialized functions in muscle contraction.
Although TnT tethers the Tn complex to the thin filament, the regulatory functions of TnT are arguably the least understood of the thin filament proteins. Physiological and biochemical studies (see, for example, Refs. 8 -14) and the discovery of TnT mutations in various organisms, including in humans, suggest previously unanticipated functions for TnT (15)(16)(17). These include regulation of the Ca 2ϩ responsiveness of contraction, sarcomere assembly, and actin-myosin cross-bridge kinetics (18). That different TnT protein domains also contribute to the functional diversity of contractile regulation in vivo is indicated by observations by ourselves and others (4 -7, 16, 19, 20) of a remarkable number of TnT isoforms. The fast TnT isoform gene produces alternatively spliced variants that alter the length and acidity of the N-terminal domain, with largely unknown functional consequences (for reviews see Refs. 2, 3, and 21). Recent studies of the human hypertrophic cardiomyopathy mutation, I79N, close to residues that are hypervariable among TnT isoforms, suggests that N-terminal isoform heterogeneity influences myosin-actin kinetics (18). Alternative exons located near the C terminus of the protein encode a domain that interacts with TnC, TnI, and Tm, providing further evidence that alternatively spliced exons encode TnT domains that modulate its function in specific muscles (9,11,22,23).
In this study, we have determined the complete sequence structure of the quail (Coturnix coturnix japonica) TnTf gene and a large set of TnT cDNAs. These data provide a basis to investigate TnT isoform diversity and to undertake analysis of the regulation and functions of specific TnTf exons during fast skeletal muscle development. Comparison of the quail and rat TnTf gene structures has provided new information on the functional diversity of TnT isoforms, on the evolutionary origin of alternatively spliced exons, and on splice junction sequences that are hypothesized to regulate TnT alternative exon splicing.
Determination of the qTnTf Gene Sequence-The qTnTf gene was initially cloned in four separate genomic fragments. gC106 is a recombinant Charon 4A phage spanning from the intron after exon 6 to the intron after exon 17. The intron/exon organization of this fragment was previously reported (6); however, the nucleotide sequences were not reported. Since gC106 did not encode the 5Ј end of the qTnTf gene (6), we rescreened the Charon 4A EcoRI partial library with a 32 P-kinased oligonucleotide, oC023 (5Ј-TGTCTGATACCGAGG-3Ј), derived from the known 5Ј-untranslated sequences (6) and isolated gC1067, which encodes the 5Ј-most sequences of the qTnTf gene.
Since the gC1067 clone did not overlap with the gC106 clone and did not account for all of the cDNA sequences that must be present in genomic exons, we undertook PCR of quail genomic DNA to clone these intervening genomic sequences. From the 3Ј end sequences of gC1067, we generated the sense primer 5Ј-CGGGATCCAAGCTTCTATCTC-TACCAGTGTCCT-3Ј. From the 5Ј end sequences of gC106, we generated an antisense primer 5Ј-CGGGATCCAAGCTTATCACTTGGCA-CACTGTGGAG-3Ј. These primers (and those described for qT2 cloning) included BamHI and HindIII sites for cloning (in italics) into pBluescript KSϩ. PCR amplification of quail genomic DNA recovered a 1.75-kb genomic fragment called qT1.
Since gC106 did not encode the final exon (exon 18) of the qTnTf gene, we sought to isolate the remaining genomic sequences between exons 17 and 18 utilizing a PCR strategy similar to that described above. From gC106 sequences we generated a sense primer 5Ј-CGG-GATCCAAGCTTAGAAGGGAAGTGGCTTGCATG -3Ј. We generated an antisense primer from exon 18 sequences 5Ј-CGGGATCCAAGCTTT-TGACACATCACTAAGGGCC-3Ј. PCR amplification of quail genomic DNA recovered a 0.7-kb fragment, called qT2.
All of the TnT clones described above were sequenced in their entirety. Subclones were generated in varying vectors (pUC8, pUC9 pEMBL18, pEMBL19, and pBluescript KSϩ), dependent on when the cloning and sequencing occurred in the course of this project. The chemical method of Maxam and Gilbert (28) was used in the earliest stages of gC106 sequence characterization. The remaining DNA sequences were determined by dideoxy termination enzymatic method of Sanger et al. (27) using both standard polymerase I large Klenow fragment and the Sequenase enzyme system (U. S. Biochemical Corp.). A modified method of Henikoff (29) was used to generate synchronous sequential deletions useful for sequence analysis. In addition, regions of the TnTf genomic sequence were determined using TnTf-specific oligonucleotides for priming of dideoxy sequencing reactions. Regions of clones gC1067 and qT1 and qT2 genomic fragments generated by PCR were sequenced by automated cycle sequencing on an Applied Biosystems 377 stretch sequencer at the University of Pennsylvania Genetics Core DNA sequencing facility. Approximately 90% of the genomic sequence was determined on both strands. For those regions determined only on one strand, these sequences were always compared with independent, overlapping clone sequences of the same strand. The genomic sequence was submitted to GenBank TM and assigned the accession number AF139128.
DNA Sequence Analysis-A contiguous sequence of TnTf genomic DNA was created with Staden, and pairwise comparisons between sequences were done with bestfit and fasta (30 -33) (GCG Wisconsin package). Homology searches were performed on the NCBI Blast server which recovered other TnT sequences in the data bases.

Sequence Analysis of a Large Set of Quail TnT Isoform cDNAs-Many
TnTf isoforms are generated by alternative exon splicing of avian and mammalian mRNA transcripts. We utilized reverse transcription PCR to examine the quail TnTf N-terminal domain diversity to 1) define differential usage of alternatively spliced exons in functionally specialized quail muscles; 2) compare avian and mammalian isoform expression as a basis to understand the evolution and function of isoform diversity; and 3) define exon structure in relation to TnTf mRNAs generated by alternative mRNA splicing.
We generated 5Ј end cDNA clones from day 10 embryonic leg, day 7 post-hatch leg, and 5-week post-hatch leg and pectoral muscles (see "Experimental Procedures"; Fig. 1) to identify a set of 5Ј cDNA sequences that would be representative of the N-terminal diversity found in different muscle types. Fifty four independent qTnTf cDNAs were isolated, sequenced, and compared with sequences of three quail fast TnT isoform cDNAs that we had reported previously (6). This analysis established that these primers amplified TnTf isoform cDNAs and identified isoforms not previously reported.
Comparison of Quail, Chicken, and Mammalian Fast TnT Isoforms-Cloning of the qTnTf cDNAs enabled detailed comparison of quail TnT predicted amino acid sequences to chicken, rabbit, and rat fast isoform sequences. Comparison of the exon structures and the amino acids encoded by different exon sequences highlights that most TnTf constitutively spliced exons (i.e. spliced into all mature mRNAs) are highly conserved (exons 1, 2, 5, 9 -15, 18). Differentially spliced exons 4, 7, and 17 are also highly conserved between rat and quail, whereas the differentially spliced exons (exons w, p1-5, y, 8, 16, and f) are divergent in their sequences (Fig. 2). The highly conserved exons encode domains of TnT for which some biochemical functions have been ascribed, whereas the exact functions of the N terminus and the biochemical or physiological consequences of the alterations in charge and length of the N terminus are not well understood (1)(2)(3)34).
Quail TnTf N-terminal Variant Isoforms-Comparisons of the 57 cDNA sequences established that they represent 14 different mRNA splice forms, including 8 novel forms. These sequences also enabled the unambiguous determination of the genomic intron/exon structure. The predicted N-terminal amino acid sequences encoded by these forms and the tissue source and number of times an individual sequence was recovered are reported in Table I.
Sequence analysis of 33 cDNAs recovered from pectoral mRNA demonstrates that 31 of these RNAs encode a histidinerich peptide (AHHEE) repeated four times and a fifth exon with the sequence AHAE (Table I and Figs. 1 and 2). The existence of exons encoding a histidine-rich (His-rich) peptide specific to avian pectoral muscle has been previously demonstrated by sequencing of chicken TnTf protein and cDNAs (35)(36)(37)(38). Our primer extension analysis of quail pectoral RNA strongly supported the existence of these pectoral isoforms in quail; however, such clones were not recovered from pectoral cDNA libraries (6). Consistent with earlier protein expression studies and studies of mRNA expression from ourselves and others (see "Discussion"), we designate these isoforms pectoral (qPec) (see below).
The pectoral cDNAs fall into 11 different cDNA classes, 9 having the His-rich peptide (qPec1-9) and 2 being more similar to the leg-type forms. Thus, these latter two forms were called qLeg2 and qLeg2a (Table I). Although qLeg2 is not represented in our limited quail leg cDNA set, it is equivalent to a reported chick leg form (38). The nine qPec forms vary by the inclusion or exclusion of peptides encoded by exons 4, w, 7, and 8. The 31 pectoral cDNAs that bear the His-rich peptide were identical for inclusion of p1-5. This is significant since Schachat and colleagues (38) reported chicken pectoral isoforms that have variant numbers of repeats of the His-rich peptide including a form that has two more His peptide repeats than we find in the qTnTf gene (38). Finally, consistent with protein studies (39), leg forms are found in our pectoral cDNA sample, but these were recovered at a relatively low frequency (3/33). The postembryonic pectoral forms show a transition to forms lacking exon 4, which is considered an embryonic exon (38).
We also examined 11 cDNA sequences recovered from embryonic day 10, and 9 cDNA sequences recovered from postembryonic leg mRNA. These 20 leg sequences fell into four different classes. All these leg clones have in common that they lack the His-rich pectoral exons, and most lack exon w, but include alternatively spliced exon 4.
Clones cC501 and cC605 were previously recovered from embryonic and adult leg cDNA libraries, respectively (6). Interestingly, of the 12 sequences recovered from embryonic leg, 10 are the cC501 N-terminal form. This suggests that this is the predominant form expressed in embryonic leg. Similarly, of the 10 cDNAs sequenced from 5-week posthatch leg, 5 are identical to cC605, suggesting that this is the predominant form at this stage.
Developmental Expression of qTnTf Isoforms-We compared our set of clones with a set of 40 chicken fetal, perinatal, and adult TnTf cDNAs that was reported by Schachat et al. (38) and an independent set of 40 chicken cDNAs (20 adult pectoral and 20 adult gastrocnemius) that was reported by Ogut and Jin (40). The set of chicken cDNAs from by Schachat and co-workers (38) was made from mRNA isolated from fetal (16 and 19 day embryos), perinatal (day 5), and adult chicken pectoral muscles and fell into 16 independent, alternatively spliced sequences that the authors placed into two broad classes based on length (Table I). Class I sequences were shorter in length, since they encoded neither the His-rich peptide sequence nor exon y, and were predominant in fetal and neonatal muscles. In addition, class I sequences often failed to include exon w. In contrast, class II sequences contained either of two length variants of the His-rich peptide and always encoded exon w and sometimes exon y. Exons 7 and 4 varied in either class cDNA. Class I sequences represented 50% of the forms in fetal pectoral muscle, with decreasing representation in neonatal and adult muscle samples (ϳ20%). In contrast, class II sequences have lower relative representation in fetal and neonatal muscles but are predominant in adult pectoral muscles. Exon y was only identified in class II fetal samples.
Similar to their findings, we identified the His-rich peptide sequences in cDNAs isolated from pectoral mRNA (36,40); however, we observed no variation in the size of the His-rich peptide included, suggesting a difference between the coding potentials of the chicken and quail TnTf genes (38). Since the larger His-rich peptide was observed in chicken fetal muscle, it is possible that a larger variant exists in the quail and that FIG. 1. Representation of quail TnTf exons and translated amino acid sequences. All known exonic sequences are included sequentially. Splice boundaries are demarcated with a slash, and exon numbers are assigned based on previously published sequences (6,7,44) or as assigned in this report (see legend to Fig. 4). The last nucleotide of each exon is underlined. In the case of split codons, the amino acid is included with the exon that has 2 nts of the codon. Sequences of oligonucleotide primers used to PCR-amplify the set of N-terminal variant cDNAs overlap with the boundaries of exons 2/3 and exons 11/12 and are highlighted in italics (see "Experimental Procedures"). 5Ј-untranslated sequences are derived from previous primer extension studies (6) and by comparison to the genomic sequences. Alternative exons ␣ and ␤ (16 and 17) are presented in tandem, although processed mRNAs include only one of these exons. Combinatorially spliced exons include exons 4, w, p1-5, 6, 7, y, 8, and 9 (see Table  I and Fig. 3).

TABLE I Summary of the qTnTf N-terminal isoform variants and comparison to known chicken isoforms
The Abbreviations used are as follows: post-h (posthatch) columns represent posthatching stage; chick equiv. (equivalent); pred. indicates predominant. Columns 2-8 indicate exons in the combinatorial region examined; the next 4 columns indicate the stage mRNA was isolated, and the last column indicates whether an equivalent chick isoform has been reported. The rows indicate the name of the isoform represented. For each isoform, plus indicates that an exon is included, and minus indicates that an exon is excluded. Numbers represent how many times a particular isoform sequence was represented in the mRNA sample. Add is an adductor form based on primer extension sequences (6). cC501 and cC605 are cDNAs isolated from embryonic cultures and adult leg cDNA libraries (6). qLeg2f, qLeg2fa, and qLeg3 are quail forms that resemble the exon usage of the predominant mammalian forms TnT2f, TnT2fa, and TnT3 (66). Six isoforms are homologous to other chicken cDNAs (38,40), and eight are new TnT isoforms.
a Isoforms with exons (p1-5) that are essentially exclusive to pectoral muscle (qPec). b Isoforms without p1-5 exons. Non-conservative substitutions are not highlighted. Exons are marked by slashes and numbered as described in the legend for Fig. 1, and the boundaries are based on the rat and quail genomic sequences. Exon boundaries are identical for rat and quail TnTf genes, except for exons that are unique to each species, and these species-specific exons are highlighted by the subscript m (mammalian) or q (quail), respectively. Although the rabbit cDNAs have sequences that look related to the avian fetal exon y (66), these sequences have not been identified in the rat genome, and it is not clear if the VHVP peptide is part of exon 7 or represents a homologous exon y. Thus we designate exon 7 as q and r (for rat), to represent that the exon boundaries are clear only for the quail and rat. This comparison highlights the substantial heterogeneity among mammalian and avian TnTf sequences in the N terminus and the high conservation in the remainder of the protein.
analysis of fetal quail muscle may identify these additional exons. However, as described below, our analysis of the genomic sequence has not identified additional His-rich encoding exons.
We did not identify a cDNA bearing exon y ( Table I) that was found in the fetal chicken class II cDNA set (36); however, the quail gene has a predicted exon located between exons 7 and 8 that corresponds to y in position and sequence content (see below). Our failure to recover an isoform bearing exon y in any of our samples likely reflects our analysis of mRNAs from muscles of day 10 quail embryos, whereas other studies that identified exon y isoforms had analyzed later fetal stages.
In the N-terminal region, we found that phylogenetically conserved exons 4 -8 showed similar splicing patterns between birds and mammals, consistent with the findings of Schachat and co-workers (38). Most significantly, exon 4 is present in mammalian perinatal TnT (35/40 quail and 29/31 chicken perinatal mRNAs have exon 4). In addition, we found three leg forms that appear similar to predominant vertebrate leg forms TnT2f, TnT2fa, and TnT3 that were recovered in our set (TnT2f was identified in day 7 pectoral muscle) and called qLeg2, qLeg2a, and qLeg3. Although the number of cDNAs analyzed does not allow statistical analysis, qLeg2a appears to be highly represented in leg muscle.
Unexpectedly, we observed a lack of significant overlap among TnTf isoforms sequenced from the chicken and quail cDNA sets. Only three forms were represented both in the quail and the chicken sets. This difference, in part, is due to the lack of y exon inclusion in any of our forms and thus likely reflects the difference in stage of muscle analyzed (38). However, both chicken and quail studies identified the same predominant adult pectoral form (see Table I). Exon usage supports the inclusion of exon 4 in perinatal muscles, and His-rich pectoral peptides are predominant in the avian pectoral muscles. These data support the conclusion that many different splice forms are produced during pectoral muscle development and provide further compelling evidence that the developmental regulation of splicing that generates TnTf isoforms is sufficient to account for accumulation of specific TnTf proteins.
Intron/Exon Organization of the qTnTf Gene-To unambiguously define exon splicing patterns that give rise to TnTf isoforms, we determined the DNA sequence of a contiguous region of 33,434 nts that encodes the qTnTf gene (see "Experimental Procedures" and Fig. 3). Clone gC1067 is a 13,876-nt clone that encodes the most 5Ј sequences of the clone. Exon 1 starts at position 1178 in the sequenced regions of gC1067. qT1 bridges the genomic phage clones gC1067 and gC106 from position 13,563 to position 15,928 in the consensus sequence, ending 315 nt into gC160. gC160 is a 17,105-nt clone that begins at 15,625 and ends at 32,718 in the consensus sequence. qT2 begins at 32,363 in the consensus sequence and ends at 33,128 in exon 18. The remaining 3Ј-untranslated TnT sequences are derived from the TnTf cDNA sequences up until the poly(A) tail at nucleotide 33,434. The structure, based on these DNA sequences and comparisons to qTnTf cDNAs, is represented in Fig. 3. We identified 25 exons, only one of which, the putative fetal exon y, has not been confirmed in a quail cDNA. Of the 25 exons, 13 of these exons are differentially spliced from the qTnTf mRNA transcript.
Identification of a Repeated Sequence Element Containing the Pectoral-specific Exons-In our sequence analysis of the genomic sequences encoding the pectoral exon, we discovered a remarkable degree of sequence identity that included and flanked each of the five pectoral-specific exons. These exons (4/5p exons are 15 nt in length) are encompassed within an approximately 900-bp sequence element that is repeated four times and one time partially (150 nucleotides), having 74 -82% sequence identity in pairwise comparisons (Fig. 4).
Identification of a CR1 Retrotransposon Located 5Ј of the Pectoral-specific Exon, p1-We identified a sequence related to the chicken CR1 transposable element upstream of the first pectoral repeat. A family of chicken middle repetitive sequences, termed CR1 for chicken repeat 1, have been described for the chicken genome. CR1 is a member of the non-long terminal repeat class of retrotransposons (41-43). CR1 elements are flanked by imperfect direct repeats of an octamer  (7,44) to identify the phylogenetically conserved exons between the quail and published rat exon designations. Exons w and y are designated according to Schachat et al. (36). Pectoral His repeat exons are designated p1-5 (see text). We have followed, with modification, the convention of Breitbart et al. (7,44), for representations of the exon structure, as follows: gray, untranslated sequences in constitutive exons; black, translated and constitutive; white, combinatorial (bracketing of the pectoral exons indicates that these are spliced together as a set in quail); striped, exchangeable and mutually exclusive (alternative) exons. Flush junction boundaries indicate that the exons begin or end with an intact codon; concave/convex boundaries indicate that the upstream exon ends in a split codon using a single nucleotide contributed by the downstream exons; and sawtooth boundaries indicate that the upstream exon lacks 2 nucleotides that are contributed by the downstream exon in the processed mRNA. sequence having the consensus (CATTCTRT) (GATTCTRT). The sequence CAATTCT GATCTTCT in the quail intron 1.5 kb upstream of exon p1 was identified, as well as some flanking sequences of approximately 90 bp having similarity to CR1 sequences (nucleotides 8990 -9081 in the submitted sequences), but we did not identify transposase encoding sequences associated with this element. This CR1 element apparently transposed into the TnTf locus, and its proximity to the pectoral repeat may be relevant to the introduction of these sequences into the avian genome (see "Discussion").
Comparison of the Rat and Quail TnTf Genes Structure-The quail TnT gene is encoded in greater than 33 kb (Fig. 3), whereas the rat TnT gene spans approximately 16 kb (7,44). The quail protein is encoded in 25 exons, 12 constitutively spliced exons, and 13 alternatively spliced exons, whereas the rat gene has 19 exons, with 11 exons predicted as constitutively spliced, and 8 exons predicted to be alternatively spliced (including exon 5, which is more likely constitutively spliced; see "Discussion"). The difference in the alternative exon coding potential between these two genes resides in the N-terminal hypervariable regions of the protein, with the remainder of the gene structure (exons 9 -18) being essentially identical in organization between the genes. This high conservation is also reflected in the nearly complete identity of the amino acid sequences in the C-terminal coding region (Fig. 2).
Analysis of Splice Site Consensus Sequences-We compared splice acceptor and donor sequences flanking constitutive and alternatively spliced exons to assess if there are significant variations and similarities between the quail and rat splice junctions that could identify sequences important for regulating splicing (Table II and legend). The analysis shows that exons p1-5, 6, 7, and y have non-consensus (at Ϫ3 position) 3Ј SS; instead of the consensus YA(G/G), the site is AA(G/G). This variant acceptor sequence is highly exceptional, as the (Ϫ3) position conforms to the consensus in 96% of acceptor junctions, suggesting that these acceptor sequences function in FIG. 4. Alignment of the genomic repeat sequence element that includes the pectoral exons. Analysis of sequences surrounding the pectoral His repeat exons reveals remarkable, highly conserved repeated elements that are 74 -82% identical in pairwise comparisons. The repeat is referred to based on the His repeat exon encoded within it (p1-5, 5Ј to 3Ј). C represents the consensus sequence for all repeats based on identity in 4/5 or 3/4 in the regions where the p5 repeat has ended; minus indicates a lack of consensus. * indicates nucleotide gap(s) in the best fit alignment made, and in some cases this is also the consensus. The PTBbinding site is underlined and the nonconserved 3Ј SS A (Ϫ3) is in bold (also see Table III). splicing regulation, as has been proposed for alternatively regulated exons of some other genes (e.g. Ref. 45). The rat fetal exon (f) also shows a similar non-consensus 3Ј SS and the chicken gene shows alternative splicing of the pectoral region in fetal muscle. Thus, this 3Ј SS sequence may have special significance for fetal splicing regulation. Finally, it is noted that two polypyrimidine sequences, with PTB consensus binding sites, are located immediately 5Ј of the splice acceptors of the tandemly repeated, pectoral specific exons p1-p5 (Fig. 4). PTB consensus binding sites have been implicated in the positive and negative regulation of alternative exon splicing (46 -48).
Comparison of the rat and quail with nucleotide sequences flanking the alternatively spliced, mutually exclusive exons 16 and 17 (Table III) reveals a pattern of conserved purines (5 for 16 and 7 for 17) in the pyrimidine-rich region of the 3Ј SS. Furthermore, the extent of sequence identity was high in the 3Ј SS consensus of exon 17 as compared with splice acceptor sequences in other exons, suggesting that splice acceptor sequences may contribute to regulation of exon 16 and 17 alternative splicing. Sequence comparisons of other intronic regions of the quail and rat genes did not reveal any significant sequence homology, supporting this possibility. DISCUSSION Here, we report the complete structure of the quail TnTf gene and a detailed analysis of the expression of TnT isoforms produced by alternative splicing of its N-terminal exons. These structural and expression data provide significant new information on the evolution, function, and developmental isoform regulation of TnTf genes in birds and mammals. We show that the qTnTf gene has 25 exons. The reported sequences include 1178 nts of sequences upstream of the transcription start site and exon sequences that comprise only 4% of the total genomic sequence. The amino acids sequences encoded by constitutively spliced exons are highly conserved between quail and rat TnTf, with the exception of mini-exon 3. The rat and quail exons share identical split codons for all homologous exons. This conservation suggests that the differential splicing of exons in this gene evolved prior to mammalian/avian divergence. As previously discussed (7,44), distribution of split codons requires that exons 3 and 9 always to be spliced into mRNAs to maintain the translational reading frame, whereas different combinations of combinatorial exons between 3 and 9 can be included. The quail TnTf gene, however, is nearly twice the size of the rat gene. This size difference may be due in part to the introduction of the avian-specific exons and to the intron/exon duplications of the pectoral exon region; however, the intron sizes in general are larger in the quail than in the rat (see Fig.  4). The species-specific (p, y, fetal) exons likely serve specialized functions in the TnTf protein (see below). Finally, although the splicing and organization of exons 16 and 17 show similar alternative splicing and similar organization in the quail and    67 and 68. Sequence comparisons among genes in the data bases show that for the 3Ј SS consensus, 100% of acceptors conform to the Ϫ1, Ϫ2 AG, 96% are non-consensus at the Ϫ3 pyrimidine (Y) position of the YAG sequence, and 50% are nonconsensus at the ϩ1 G position, and for the 5Ј SS consensus, 60% conform to the Ϫ2 A position, 80% conform to the Ϫ1 G residue and 100% conform to the ϩ1 and 2 GT consensus. Variations in the sequence from consensus sequences are indicated by boldface letters. Sequences for short exons are included in their entirety. Longer exons are represented at their 5Ј and 3Ј junctions with omitted sequences indicated by -N-, with N being the number of nucleotides omitted. Y represents pyrimidine residues. No reproducible pattern of variation of all alternative exons from consensus splice junctions are evident in alternatively spice exons; however, the exons p1-5, 6, 7, and y all have a nonconsensus A at the Ϫ3 position of the 3Ј SS sequence. The rat fetal exon, f, also shares the nonconsensus sequence A AG at the Ϫ3 position of the 3Ј SS. Alternatively spliced exons 16 and 17 (␣ and ␤) have shared features with the rat gene, which are described in Table III. Exon 3Ј SS (acceptor sequence) Exon sequences 5Ј SS (donor sequence) rat, exon 17 encodes divergent protein sequences, suggesting specialized functions for this domain in avian and mammalian muscles (23). Several significant features of the quail TnTf gene were revealed through these structural and exon expression analyses and comparisons to the homologous rat TnTf gene (7,44). These features include the following: 1) identification of 13 constitutively spliced, 11 (4, w, p1-p5, 6, 7, y, and 8) combinatorially spliced, and 2 (16 and 17) alternatively spliced exons; 2) the combinatorially spliced exons encode N-terminal TnT sequences and are subject to developmental and muscle-specific splicing regulation in the skeletal muscles of the quail embryo; 3) 5 of these N-terminal exons, which encode the pectoralspecific, His-rich domain that is unique to the pectoral muscle TnTfs of Galliformes and Craciformes, are encoded within a novel, 900-bp intronic sequence that is tandemly repeated; 4) a CR1 transposable element sequence located immediately 5Ј of this tandemly repeated domain is implicated in transposition of these pectoral muscle exons and associated intronic repeat in the N-terminal TnTf region of additional combinatorially spliced exons; 5) non-consensus splice acceptor sequences and adjacent polypyrimidine tracks and PTB-binding sites that are implicated in tissue-specific splicing regulation of the pectoral muscle-specific exons (46 -48); and 6) conserved, non-consensus splicing acceptor sequences associated with the mutually exclusive, alternatively spliced exons 16 and 17 in the C-terminal domain, further implicating splice acceptor sequences in alternative exon splicing regulation.
Evolution of the Pectoral-specific His Repeat Exon Domain-Intronic sequences that surround the five pectoral-specific exons (p1-5) are highly conserved and tandemly repeated 4 and 1/8 times, resulting in four identical mini-exons (p1-4) bearing the His repeat motif, AHHEE, and one divergent mini-exon bearing the His motif, AHAE. Since this motif is amplified to variable amounts in different species of birds, this demonstrates that recombination is actively occurring within this domain during the speciation within the Galliformes and Craciformes orders (49). Although a similar recombination mechanism has been proposed for the origin of the N-terminal combinatorial exons and exons 16 and 17 of the TnTf gene based on their related exon sequences (7,44), no related intronic sequences were observed in the regions flanking these alternative exons. To our knowledge, the conservation of the repeat sequences of the quail pectoral-specific exons and surrounding intronic sequences is novel, providing direct evidence of intragenic recombination, generating duplicated exons. Indeed, conservation of the intron and exon sequences of the His repeat region of quail TnTf, is striking; comparisons of other intronic sequences among chick and quail genes reveal only 25% similarity, 2 the level expected for random drift of "nonfunctional" sequences after divergence of the chick and quail lineages. This indicates that there are strong selective pressures to maintain the conservation of the intron and exon sequences around exons p1-5. These selective pressures are likely to be multiple, including protein structure and functional requirements to maintain and amplify the His repeat motif within the TnTf protein for specialized pectoral muscle functions and requirements to conserve splicing regulatory sequences that restrict the splicing of these exons to the pectoral muscles. In this regard, it is notable that the chicken TnTf pectoral exon has two additional His repeat motifs as compared with the quail (38). Together these data suggests a role for recombination mechanisms to amplify and to maintain homology in these intronic and exonic sequences. Such mechanisms have been discussed for maintenance of repeated gene families (50 -52). It will be of considerable interest to determine the genomic organization of the chick TnTf intron/exon domain encoding these pectoral exons and to compare these with the quail pectoral exon domains, as an approach to the analysis of the recombination mechanisms that lead to the conservation of these intronic sequences, and the sequences responsible for pectoral exon splicing regulation.
Two DNA transposition hypotheses can now be considered for the introduction of the His repeat exon domain and its repeated intron sequences into the ancestral genome of Galliformes and Craciformes. We have identified a CR1 retrotransposon element (41) 1.5 kb upstream of the tandemly repeated intronic domain region, suggesting that a retrotransposon mechanism could have inserted and duplicated the pectoral exons. In its current state, the CR1 repeat in the qTnTf gene is imperfect as its second direct repeat is 6 and not 8 nt, and no transposase open reading frame is associated with this CR1 element; however, there are approximately 90 bp upstream having CR1-related sequences, verifying its relationship to CR1. Although sequences associated with the pectoral repeat have not been otherwise associated with CR1 elements, it is possible that a mobile CR1 picked up these sequences at another genomic site prior to insertion into TnTf. Evidence that CR1 retrotranspositional machinery can be usurped and facilitate movement and insertion of non-CR1 sequences has been presented by others (42,53). As a test of this hypothesis, it will be of interest to examine and compare the TnTf genomic sequences of the chick and other Galliformes and Craciformes for the presence and structure of this CR1 element and its relationship to the His repeat domain.
An alternative hypothesis for the origin of the pectoral exons is based on the observation that the pectoral exons are 54% related to portions of the histidine/alanine-rich proteins present in the avian malarial parasite Plasmodium lophurae (49), suggesting the possibility that this exon sequence was acquired by a recombinational event between the malarial parasite genome and the genome of the ancestor of the Galliformes and Craciformes. Since the sequences encoded within a single avian exon (AHHEE) do not correspond exactly to this metal-binding motif (HXXXH) (see Figs. 1 and 2), this motif would have been split into exons, according to this model, either prior to or after introduction into the ancestral genome.
Physiological Functions of the Pectoral His Repeat Sequences-The pectoral muscles of Galliformes and Craciformes are distinctive in that they function in explosive but short-lived flight, and expression of TnTf isoforms with the His repeat could reflect a specialized functional adaptation related to muscle metabolism and function. Interestingly, bacterial and vertebrate metal-binding proteins have HXXXH motifs typical of the quail pectoral exon. These sequences, in ␣-helical form (such as predicted for the avian motif), generate high affinity binding sites for transition metal ions (Cu 2ϩ , Ni 2ϩ , Zn 2ϩ , and Co 2ϩ ), and TnTf isoforms bearing this domain are specifically retained on metal affinity columns (49). Birds having the HXXXH motif also have lower Zn 2ϩ concentration in their pectoral muscle, suggesting that this domain could regulate the free metal ion concentrations in pectoral muscle. Consistent with a possible role in muscle protein function, binding of metal ions to the N terminus has been shown to generate conformational changes in a TnT N-terminal peptides by stabilizing ␣-helical structure and influencing binding to Tm (54). Alternatively, the His repeat domain may be sensitive to alterations in pH that occur during anaerobic work, e.g. a change from cellular pH of 7.1 to a pH of 6.5 (55,56). Histidine-imidazole groups are unusual in that they have a pK close to 7 at 25°C.
At pH values typical of biological systems, these groups are approximately half-protonated and thus are well suited to function in reversible protonation/deprotonation events. For instance, a pH-dependent effect of histidine-imidazole groups in enzyme function is well documented for lactate dehydrogenase in muscle (reviewed in Ref. 56). Consistent with a specialized function of the His repeat motif in anaerobic pectoral muscle function, recent studies show that the N-terminal region of the pectoral TnTf isoform shows pH-dependent binding affinities for Tm and TnTf to Tm binding more stable at lower pH (40). Therefore, specialized functions of the His exon protein domain in pectoral TnTf could control the conformation of this domain and interaction with Tm and thus shift the contraction force, pK dynamics, in the muscle in response to the change in the protonation state of the histidine residues. These mechanisms could provide dynamic ways to alter the activation state, dependent on the pH of the muscle. Although the functional contributions of the His domain in vivo are unclear, the Nterminal variable domain of TnT, which includes the pectoral His domain, is physiologically distinct for fast skeletal muscle types (10,14,57), and the importance of this region has been highlighted by the discovery of mutations near this N-terminal domain that disrupt TnT function and result in human disease (18). We are currently testing these hypotheses of TnT domain function through in vivo expression studies of TnTf isoforms characterized in this study.
mRNA and Protein Analysis of TnT Isoform Expression-Through combinatorial and alternative exon splicing, the TnTf gene has the potential to produce a large number of different mRNAs for production of isoforms with variations of protein structure in the N-and C-terminal domains. Our quail data in combination with chicken TnTf isoform data (36,40) establish that there is great diversity of TnTf mRNAs and that their expression is highly regulated in different avian muscles and during development. It also is of interest that there is very little overlap in the isoforms that we have identified with the isoforms reported in chick (36). The findings of extensive TnTf mRNA isoform diversity, as assayed by reverse transcription PCR at the mRNA level, are consistent with results from studies of TnT protein isoforms (for example see Refs. 35, 39, and 58 -61). It is currently difficult to compare the muscle specificity of avian TnT isoforms to the characterized mammalian isoforms, although it is notable that qTnT2fa, which is abundant in adductor muscle, is homologous to the mammalian TnT2fa form in fast oxidative muscles (tongue), which are similar in fiber type to the avian adductor. In addition, although a remarkable number of isoforms (at mRNA and protein level) are made in avian muscles, some studies suggest that the number of abundant cDNAs and proteins made in mammals may be more restricted (62). Furthermore, our data and the data of Briggs and Schachat (62) suggest that exon 5 is most likely a constitutively splice exon, rather than a combinatorially spliced exon, since all cDNAs sequenced to date from chicken, quail, and mammals include this exon.
Non-consensus Splice Acceptor Sequences of Alternatively Spliced TnTf Exons-The combinatorially spliced TnTf exons, p1-5, 6, 7, and y, all have a non-consensus purine nucleotide at the Ϫ3 position in the 3Ј SS consensus. The rat TnTf fetal exon is also non-consensus in this position. This variant purine in the 3Ј SS consensus likely contributes to the regulated splicing of these exons, as supported by genetic studies of the Drosophila myosin heavy chain gene, which also has alternative exons with variant splice acceptors (45,63). It also is notable that there are perfect polypyrimidine sequences containing PTBbinding sequences immediately 5Ј of the non-consensus splice acceptors of the five pectoral exons (p1-5). Such polypyrimi-dine/PTB sequences participate in the regulated alternative exon splicing of muscle tropomyosin and other tissue-restricted mRNA splicing (46 -48). In addition to regulatory sequences associated with the splice acceptor of TnTf combinatorial exons, acceptor sequences immediately 5Ј of alternatively spliced exons 16 and 17 in the rat and quail TnTf genes have nearly identical purine substitutions in what should be the pyrimidine-rich domain, and a variant lariat branch sequence, which does not match the consensus in the region upstream of exon 17, is displaced toward the donor. The conserved pattern of purine substitutions may control splice site selection by splicing regulatory factors (64), and displacement of the lariat may result in steric hindrance of lariat formation. It is also possible that splicing enhancers within exon sequences contribute to regulated splicing mechanism, as has been found for the chicken heart TnT isoform gene (65). These findings, together with the availability of the complete TnTf gene structure, make possible directed experiments to test the specific regulatory functions of the splicing acceptor sequences of both combinatorial and alternatively spliced exons and to identify other candidate splicing regulatory sequences as well as the musclespecific splicing factors that interact with these sequences and likely regulate their splicing.