Evolution, Organization, and Expression of a -Tubulin Genes in the Antarctic Fish Notothenia coriiceps ADAPTIVE EXPANSION OF A GENE FAMILY BY RECENT GENE DUPLICATION, INVERSION, AND DIVERGENCE*

To assess the organization and expression of tubulin genes in ectothermic vertebrates, we have chosen the Antarctic yellowbelly rockcod, Notothenia coriiceps , as a model system. The genome of N. coriiceps contains ; 15 distinct DNA fragments complementary to a -tubulin cDNA probes, which suggests that the a -tubulins of this cold-adapted fish are encoded by a substantial multigene family. From an N. coriiceps testicular DNA library, we isolated a 13.8-kilobase pair genomic clone that contains a tightly linked cluster of three a -tubulin genes, designated NcGTb a a, NcGTb a b, and NcGTb a c. Two of these genes, NcGTb a a and NcGTb a b, are linked in head-to-head (5 * to 5 * ) orientation with ; 500 bp separating their start codons, whereas NcGTb a a and NcGTb a c are linked tail-to-tail (3 * to 3 * ) with ; 2.5 kilobase pairs between their stop codons. The exons, introns, and untranslated regions of the three a -tubulin genes are strikingly similar in sequence, and the intergenic region between the a a and a b genes is signifi-cantly palindromic. Thus, this cluster probably evolved by duplication, inversion, and divergence of a common ancestral a -tubulin gene. Expression of the NcGTb a c gene is cosmopolitan, with its mRNA most abundant in hematopoietic, neural, and testicular tissues, whereas NcGTb a a and NcGTb a b transcripts accumulate primarily in brain. The differential expression of the three of the b to ensure efficient synthesis of tubulin polypeptides.

bolic efficiency and preserve macromolecular function in their now chronically cold marine environment (Ϫ1.86 to approximately 1°C). The translational machinery of these fishes, for example, shows clear evidence of cold adaptation (3,4), with rates of polypeptide chain elongation more than 10-fold greater than those measured in temperate fishes cooled to comparable temperatures. Similarly, the polymerization energetics of the actins of Antarctic fishes (5) and the ATPase activities of their skeletal myosins (6) support efficient myofibrillar assembly and function at their low habitat temperatures. Our goal is to determine the molecular adaptations, both qualitative and quantitative, that maintain the efficient expression of the tubulin genes and the polymerization capacity of the tubulin polypeptides of these extreme psychrophiles.
We and others (7,8) have shown previously that the critical concentration for microtubule formation by the brain tubulins of Antarctic fishes (ϳ1 mg/ml) is comparable to those of temperate poikilotherms and homeotherms at their much higher body temperatures. Conservation of the critical concentration by Antarctic fishes probably results from structural changes, both in primary sequences and in posttranslational modifications, intrinsic to their ␣and ␤-tubulin subunits (9 -12). The primary sequence of class II ␤-tubulin from the yellowbelly rockcod Notothenia coriiceps, for example, contains unique residue substitutions that increase both the hydrophobicity and the flexibility of the polypeptide chain (12,13), two factors that should favor microtubule formation in an energy-poor environment. Similar alterations have been observed in other ␣and ␤-tubulin chains of this species. 2 A second, related challenge confronting Antarctic fishes is the synthesis of sufficient quantities of the ␣and ␤-tubulins to attain the critical concentration of tubulin dimers in their cells.
The abundant expression of tubulin in the brains of Antarctic fishes is likely to require compensatory adjustments in gene transcription to offset the rate-depressing effects of low temperature. Potential adaptations include increases in tubulin gene number, organization of tubulin genes into efficient transcription units, evolution of more efficient gene promoters, enhancers, transcription factors, and/or RNA polymerases, and enhancement of mRNA stabilization. To evaluate these possibilities, we have initiated analysis of the structure, genomic organization, and expression of the tubulin genes of N. coriiceps. Our results suggest that several of these modes of adaptation may be exploited by these cold-living vertebrates.
In higher vertebrates, the ␣and ␤-tubulins are encoded by small gene families (ϳ6 -7 functional genes for ␣ and a similar number for ␤), each member of which yields a structurally distinct polypeptide (14 -16). These genes are generally thought to be unlinked and dispersed throughout the genome (17). In a study of the chicken ␣-tubulin gene family, for example, Pratt and Cleveland (18) found that four of five genomic clones contained single ␣-tubulin genes; the fifth contained two ␣-tubulin genes, one functional and the second a pseudogene. The genomes of lower, nonvertebrate eukaryotes, by contrast, frequently contain tightly linked tubulin genes. Protozoan parasites possess tubulin gene ensembles, either as separate tandem groupings of ␣or ␤-tubulin genes (Leishmania spp. (19,20)) or as linked ␣/␤ tandem repeats (Trypanosoma brucei (21,22)). Similarly, some of the tubulin genes of the sea urchin Lytechinus pictus are organized in distinct ␣ or ␤ clusters (23).
The regulation of tubulin gene expression occurs at both transcriptional and translational levels. The tissue-specific and hormonally regulated expression of the ␤-tubulin genes of Drosophila is controlled both by upstream promoter elements and by negative and positive regulatory elements (silencers and enhancers) generally located within the first introns (24 -28). Less is known regarding regulation of tubulin gene expression in vertebrates. TATA boxes are generally present in vertebrate ␣and ␤-tubulin promoters (29,30), and high level expression of a Xenopus ␣-tubulin gene, X␣T14, in oocytes is regulated by three CCAAT boxes, a "heat-shock-like" element (all located 60 -200 bp upstream of the transcription start site), and their corresponding transcription factors (31). Cotranslational regulation of tubulin mRNA stability also contributes to control of cellular tubulin levels (32). When the pool of tubulin dimers is high, ␤-tubulin mRNAs are targeted for degradation by binding of a cellular factor to the ribosome-bound amino-terminal ␤-tubulin tetrapeptide (33,34).
Here we report the first example of clustered tubulin genes in a vertebrate, the Antarctic rockcod N. coriiceps. Three ␣-tubulin genes, designated NcGTb␣a, NcGTb␣b, and NcGTb␣c, are tightly linked in an ϳ10-kb segment of DNA, with ␣a and ␣b linked head-to-head (5Ј to 5Ј) and ␣a and ␣c tail-to-tail (3Ј to 3Ј). The similarity of the nucleotide sequences of these genes, strikingly illustrated by an ϳ480-bp palindrome linking ␣a and ␣b, suggests that the cluster evolved approximately 7-31 million years ago by duplication, inversion, and divergence of a common ancestral ␣-tubulin gene. The neurally restricted expression of the ␣a/␣b gene pair and the widespread expression of ␣c appear to be governed by distinct sets of promoter and enhancer elements. We have also identified a 285-bp element from the ␣a/␣c intergenic region that is distributed widely in notothenioid genomes. We propose that expansion of the number of ␣-tubulin genes in the N. coriiceps genome facilitates the synthesis of ␣-tubulin chains at low temperature by providing additional templates for mRNA synthesis. The selective pressure favoring this expansion was probably the cooling of the Southern Ocean beginning ϳ25-40 mya. A preliminary report of some of this work has appeared (35).

EXPERIMENTAL PROCEDURES
Collection of Fish Tissues-Specimens of the Antarctic yellowbelly rockcod, N. coriiceps, were collected by bottom trawling from the R/V Hero or from the R/V Polar Duke near Low and Brabant Islands in the Palmer Archipelago. They were transported alive to Palmer Station, Antarctica, where they were maintained in seawater aquaria at Ϫ1.5 to 1°C. Tissues (testis, brain, gill, liver, spleen, blood, and muscle) were dissected, frozen in liquid nitrogen, and maintained at Ϫ70°C until use.
Frozen testis tissue from the New Zealand black cod, Notothenia angustata, was generously provided by Dr. Arthur DeVries (University of Illinois, Urbana).
Genomic Library Construction and Screening-A genomic library of N. coriiceps testicular DNA was constructed in the vector Charon 35 (42). High molecular weight DNA was digested partially with MboI, and fragments of 15-20-kb, obtained by sucrose gradient centrifugation, were ligated to the BamHI sites of the vector arms. Recombinant phage DNA was packaged in vitro (Packagene; Promega). The unamplified library was screened for clones encoding ␣-tubulin genes by hybridization (38) of nitrocellulose replicas of bacteriophage plaque DNA to the 32 P-labeled chicken cDNA. Prehybridization and hybridization of the membranes were performed at moderate stringency (see "Southern Analysis of Genomic DNA") for 1 and 18 -20 h, respectively. Positive plaques were detected autoradiographically as described above. One hundred twenty candidate ␣-tubulin genomic isolates, obtained from a primary screen of 500,000 recombinant phage, were classified as strongly, moderately, or weakly hybridizing. One strongly hybridizing isolate, designated S2, was carried through two additional rounds of plaque purification and screening, and single plaques were picked for clone stock preparation.
By using testicular DNA from N. angustata, we constructed a genomic library of 15-20-kb fragments in the vector LambdaGEM-11 (Promega). DNA was digested partially with MboI; fragments were ligated to phage arms containing XhoI half-sites, and recombinant phage DNA was packaged in vitro (Packagene; Promega). The library (titer ϭ 1 ϫ 10 6 ) was screened for candidate ␣-tubulin clones by hybridization to an N. coriiceps ␣-tubulin cDNA, NcTb␣1, essentially as described for ␣-globin genes by Zhao et al. (43). Twelve candidate ␣-tubulin genomic isolates were obtained from a primary screen of 200,000 recombinant phage. Six of these isolates were carried through two additional rounds of plaque purification and screening, and single plaques were picked for clone stock preparation.
cDNA Library Construction and Screening-Total RNA was isolated from brain tissues of N. coriiceps (38,44), and poly(A) ϩ RNA was selected by oligo(dT)-cellulose affinity chromatography (45). Two different libraries were made. Oligo(dT)-primed cDNA synthesis and construction of the first library in gt10 followed the procedures described by Huynh et al. (46). The second library was constructed in ZAP II (Stratagene); cDNA synthesis was primed with a mixture of random hexanucleotides (75%) and oligo(dT) (25%). The libraries were screened for recombinant clones bearing ␣-tubulin coding sequences by hybridization of nitrocellulose or nylon (MagnaLift, MSI, Westboro, MA) replicas of bacteriophage plaque DNA to the probe, 32 P-labeled by nick translation (38) or by random priming (47). In early screens c␣1 was used as probe, whereas in later screens ␣-tubulin cDNAs from N. coriiceps were employed. Hybridization and washing of the membranes were performed as described (12), and positive plaques were detected autoradiographically. A total of 159 candidate ␣-tubulin cDNA isolates were obtained from three screens (632,000 total recombinant phage) of the two libraries, and 80 of these were carried through tertiary plaque purification/screening. Three cDNA clones (designated NcTb␣2, NcTb␣7, and NcTb␣8) from the second library that corresponded to the three ␣-tubulin genes (␣b, ␣a, and ␣c, respectively) of the genomic cluster ( Fig. 1) were sequenced (see below). The nucleotide sequence of NcTb␣2 downstream of codon 168 was used to complete the sequence of the partial ␣b gene. Two cDNAs (NcTb␣1 and NcTb␣3) from the first library were also characterized.
Subcloning and DNA Sequence Analysis-Parental clones and restriction fragment or deletion (48) subclones were sequenced manually on both strands by use of the dideoxynucleotide chain termination method (49) and T4 DNA polymerase (Sequenase II; U. S. Biochemical Corp.). Portions of the sequence were established by use of the PRISM Ready Reaction Dye Deoxy Termination Cycle Sequencing Kit (Applied Biosystems), and the products were electrophoresed on an Applied Biosystems 373A automated DNA sequencer (University of Maine DNA Sequencing Facility).
Nucleotide and amino acid sequence analyses of the N. coriiceps ␣-tubulin genes, cDNAs, and their encoded proteins were performed by use of the Clustal method provided by DNASTAR MegAlign. DNA sequence relatedness was calculated as the similarity index of Dayhoff (50) as implemented by DNASTAR Align.
GenBank Accession Numbers-The sequence of the N. coriiceps ␣-tubulin gene cluster reported in this paper has been deposited in the GenBank TM data base under the accession number AF082027. The sequence of the cluster has been scanned against the GenBank TM data base using the BLASTN program (National Center for Biotechnology Information) to identify sequences with significant relatedness. Related sequences, and their accession numbers, are presented under "Results." Northern Analysis of ␣-Tubulin Gene Expression-Total RNAs from testis, brain, gill, liver, spleen, blood, and muscle were isolated from tissues by a modification (44) of the acid guanidinium thiocyanate/ phenol/chloroform method (51). RNAs (5 g/slot) were applied to nylon membranes (MagnaGraph, MSI) by vacuum aspiration using a Bio-Rad Bio-Dot slot-blot apparatus. Sets of seven RNA samples were hybridized to PCR-generated, 32 P-labeled probes specific for the 3Ј-UTRs of the cDNAs NcTb␣2 (␣b gene), NcTb␣3, NcTb␣7 (␣a gene), or NcTb␣8 (␣c gene). To estimate the total ␣-tubulin mRNA in each tissue, a control set of RNA samples was hybridized to a fragment of NcTb␣1 encoding amino acid residues 1-430. Prehybridization and hybridization of the membranes were performed in 5ϫ SSPE (1ϫ SSPE ϭ 0.18 M NaCl, 0.01 M Na 2 HPO 4 ⅐7H 2 O, 0.001 M EDTA), 5ϫ Denhardt's solution (41), 50% formamide, 0.2% SDS at 42°C for 2 and 18 -20 h, respectively, after which the membranes were washed sequentially with buffers of increasing stringency (final wash conditions ϭ 0.1ϫ SSPE, 42°C, 15 min). The membranes were exposed to Fuji RX x-ray film at Ϫ70°C with intensification.
PCR-based Gene Linkage Analysis-To determine the potential linkage of ␣-tubulin genes in the N. angustata genome, we employed a PCR-based strategy using as template phage DNAs purified from the six tertiary genomic clones (see "Genomic Library Construction and Screening"). Nondegenerate primers corresponding to highly conserved regions of the primary sequence of the N. coriiceps ␣-tubulins were synthesized as follows: 1) sense primer, 5Ј CAGTTTGTGGACTGGTGC 3Ј (residues 341-347, N-Gln-Phe-Val-Asp-Trp-Cys-C); 2) antisense primer, 5Ј AGCTCCAGTCTCACTGAAG 3Ј (reverse complement of coding sequence for residues 53-58; N-Phe-Ser-Glu-Thr-Gly-Ala-C). The primers were used in three combinations as follows: 1) sense alone to amplify tail-to-tail-linked genes; 2) antisense alone to amplify head-tohead-linked genes; and 3) sense plus antisense to establish head-to-tail linkage by difference (i.e. PCR products not shared with sense alone and antisense alone reactions). Each PCR reaction contained 3-5 ng of template DNA, 1.6 M primers (0.8 M of each primer when different), and CLONTECH Advantage TM KlenTaq polymerase mix (optimized for long distance PCR) (52). Touchdown PCR (53) was performed for 29 cycles using the following parameters: 1) denaturation steps, 94°C, 30 s; 2) annealing steps, first 9 cycles ramping the temperature from 70 to 62°C in 1°increments followed by 20 cycles at 62°C; and 3) extension steps, 68°C, 6 min. PCR products were analyzed on 1% agarose gels containing 1ϫ TBE (0.089 M Tris borate, 2 mM EDTA, pH 8.0) and 0.0005% ethidium bromide. The ends of the PCR products were sequenced by the automated procedure to establish ␣-tubulin gene orientation.
Genomic Southern Analysis of a Repetitive DNA Element-During characterization of the N. coriiceps ␣-tubulin gene cluster, we discovered a 285-bp repetitive element. To determine the abundance, organization, and species distribution of this fragment, we hybridized it to Southern replicas of HindIII-digested genomic DNAs from Antarctic and temperate notothenioids, other temperate fishes, an amphibian, and a reptile. Restriction endonuclease digestion, electrophoresis, and transfer of DNAs were performed as described previously (12). Prehybridization of the membrane and subsequent hybridization to the 285-bp probe (labeled with 32 P by random priming (47)) were performed as described by Detrich and Parker (12) with the following exceptions: 1) prehybridization was for 2 h; 2) the prehybridization/hybridization temperature was 63°C; and 3) the membranes were washed to final stringencies of 0.1-1ϫ SSC, 63°C, for 15-40 min. The membranes were exposed to Fuji RX x-ray film as described above.

RESULTS
Estimation of ␣-Tubulin Gene Number in N. coriiceps-To estimate the number of ␣-tubulin genes possessed by N. coriiceps, we probed its genome for sequences complementary to ␣-tubulin cDNAs from the chicken (c␣1) and from Chlamydomonas (␣10 -2). Fig. 1 shows that the ␣-tubulin probes hybridized to 10 -15 different fragments in each restriction digest of the fish DNA. Furthermore, the hybridization patterns generated by the two heterologous cDNAs were virtually identical. These results suggest that the ␣-tubulins of N. coriiceps, like its ␤-tubulins (12), are encoded by a multigene family that is larger than those of higher vertebrates (14,16,39). Of particular note, the strong hybridization signals observed for some of the fragments raised the possibility that they contain multiple, linked ␣-tubulin genes.
Organization of an N. coriiceps ␣-Tubulin Gene Complex-To investigate the organization of the ␣-tubulin genes of N. coriiceps, we selected a strongly hybridizing clone, S2, that carried an insert of ϳ13.8 kb. Preliminary restriction mapping and Southern hybridization analysis suggested that the insert contained two or more ␣-tubulin genes in a segment of ϳ10 kb. Subsequent sequence analysis revealed that S2 contains two complete ␣-tubulin genes, designated NcGTb␣a and NcGTb␣c, and one partial gene, NcGTb␣b, that abuts one end of the genomic fragment. Fig. 2 presents the organization and salient features of this gene complex. Two of the genes, ␣a and ␣b, are linked in head-to-head, or 5Ј to 5Ј, orientation with ϳ500 bp separating their start codons. By contrast, the ␣a and ␣c genes are linked tail-to-tail (3Ј to 3Ј) with ϳ2 kb between their poly(A) signal sequences. The approximately 4 kb of sequence to the left of the ␣c gene is devoid of ␣-tubulin coding sequences.
N. angustata Is a Temperate Congener of N. coriiceps-To determine whether this mesophilic species shares a similar organization of its ␣-tubulin genes, we probed its genome by Southern blot hybridization to the N. coriiceps NcTb␣1 cDNA. The ␣-tubulin fragment patterns observed for HindIII-digested N. angustata and N. coriiceps DNAs shared some similarities in the low molecular weight region, but the temperate species contained few of the strongly hybridizing fragments of high molecular weight (Ͼ5 kb) that are suggestive of gene clustering in the cold-living fish (data not shown). We also examined six genomic ␣-tubulin clones from N. angustata for gene clustering by use of PCR-based linkage analysis. Two of the six clones gave the same 3-kb amplification product, which corresponded to head-to-tail linkage of a pair of ␣-tubulin genes with approximately 2.5 kb separating their respective coding sequences (data not shown). The remaining four clones apparently contained single ␣-tubulin genes. Although these surveys were not exhaustive, they do suggest that the extent of ␣-tubulin gene clustering in N. angustata is smaller than that of the Antarctic fish.
Three ␣-Tubulin Genes and Their Encoded Polypeptides- Fig. 3, A-C, shows the nucleotide sequences and translations of the NcGTb␣a, NcGTb␣b, and NcGTb␣c genes, respectively. (For comparative purposes, the sequence of the ␣b gene downstream of Glu 168 has been completed from its cognate cDNA NcTb␣2.) Table I gives estimates of the sequence similarities of these genes and subregions thereof. The ␣a, ␣b, and ␣c genes are quite similar to each other (80 -83%), which suggests that they may have arisen by duplication and divergence of a common ancestral gene. Each gene contains three introns that interrupt the nucleotide sequence after codon 1, within codon 76, and after codon 125, positions that are highly conserved in other vertebrate ␣-tubulin genes (54). In general, ␣a appears to be most closely related to ␣b, except that its small introns 2 and 3 are quite similar to those of ␣c.
The three genes encode distinct, but closely related, ␣-tubulin polypeptides (Fig. 4). Compared pairwise, the ␣a-, ␣b-, and ␣c-tubulin chains are 98.4 -98.9% identical to each other. With respect to other vertebrate ␣-tubulins, the three fish chains are very closely related to the ␣T6-tubulin of the ray Torpedo marmorata (97.6 -97.8% sequence similarity; GenBank TM accession number P36220) and to two mammalian ␣ chains, ␣1-tubulin of Chinese hamster (97.3-97.6% sequence similarity; accession number P05209) and the M␣2 isotype of mouse (97.1-97.3% sequence similarity; accession number P05213). Somewhat surprisingly, the N. coriiceps ␣-tubulins are only ϳ94% similar to two ␣-tubulin polypeptides from salmonid teleosts, the rainbow trout (Oncorhynchus mykiss testis-specific ␣ chain, accession number P18288), and the chum salmon (O. keta ␣ chain, accession number P30436). However, this apparent discrepancy most likely reflects the multiplicity of ␣-tubulin isotypes in vertebrates and the paucity of fish tubulin sequences available for comparison.
Like the Ncn␤1 ␤-tubulin cDNA of N. coriiceps (12), the NcGTb␣a, ␣b, and ␣c genes show a strong preference for codons containing G or C in the third position. Although 57 (␣a) to 58 (␣b, ␣c) codons are used, the frequency of codons ending in G or C is 2.05 (␣a) to 2.07 (␣b, ␣c) times that of codons with third position A or T. The codon bias of the three ␣-tubulin genes stands in striking contrast to their A ϩ T rich introns (see below) and to the G ϩ C content (39 -43%) of the genomes of closely related Antarctic nototheniid fishes (55). ␣-Tubulin genes from the chum salmon (accession number X66973) and the rainbow trout (accession number M36623) are similarly biased to third position G or C (G ϩ (C/A) ϩ T ϭ 2. Gene-specific probes derived from the 3Ј-UTRs, used to assess the expression of these genes in major tissues (Fig. 5), are shown underlined.
Clustered ␣-Tubulin Genes in Antarctic Fish 34362 which suggests that mutational bias is a major factor influencing choice of synonymous codons in both fishes.
Expression of the ␣-Tubulin Gene Cluster-To determine whether the ␣-tubulin genes of the N. coriiceps cluster are functional, we used gene-specific probes complementary to their 3Ј-UTRs (see Fig. 3) to measure steady-state mRNA levels in seven tissues of N. coriiceps. Fig. 5 shows that the ␣c gene is expressed most widely (all tissues except liver), whereas ␣a and ␣b expression is restricted primarily to brain. The mRNAs for all three genes accumulate significantly, and to comparable levels, in neural tissues. ␣c mRNAs are also prominent in red blood cells and testis. A fourth ␣-tubulin gene that is not part of this cluster (represented by the NcTb␣3 cDNA) also shows widespread expression. We conclude that each of the three ␣-tubulin genes of this cluster are functional and that regulation of the ␣a/␣b gene pair differs from the ␣c gene.
Structural Features and Potential Regulatory Motifs of the Three ␣-Tubulin Genes-The striking similarity of the ␣a, ␣b, and ␣c genes, together with their unusual organization and differential expression, prompted a detailed comparison of their coding and noncoding regions.
5Ј-Promoter and -Untranslated Regions-The organization of the ␣a and ␣b genes as divergent transcription units with potentially overlapping promoters, and their probable evolution by gene duplication, inversion, and divergence, suggest that the two genes may share structural features in their 5Ј-noncoding sequences (i.e. promoters and untranslated sequences). Indeed, Fig. 6A shows that the 479-bp DNA segment that separates the start codons of the ␣a and ␣b genes possesses an axis of 2-fold rotational symmetry. Thus, this intergenic region is substantially palindromic (overall similarity index for the two halves ϭ 78%), and the 5Ј-promoter and -untranslated regions of the two genes are strongly related. It is not surprising, then, that the two genes show an identical pattern of expression (Fig. 5). The initiator codons of ␣a and ␣b occur in contexts (CAAGCAATCATGG and CGAGCAAT-CATGG, respectively; cf. Fig. 3, A and B) that approximate the consensus signal for translation initiation in vertebrates, (GC-C)GCC(A/G)CCATGG (57,58).
Despite the symmetry of the ␣a/␣b intergenic region, we have found it difficult to identify basal and tissue-specific promoter elements that would explain the neural expression of the two genes. A potential, but noncanonical, TATA box (consensus TATAAA) (59) found upstream of the ␣a start codon and untranslated region ( Fig. 6B; reverse complement shown in reversed text starting at Ϫ105) is not present in a corresponding location upstream of ␣b. Rather, the ␣b gene possesses a possible, but corrupt, TATA motif that begins at position Ϫ187. True CCAAT boxes (59, 60) are absent. The ␣a/␣b intergenic region contains initiation response elements (consensus WW-YACTYYY) (61) and C/EBP motifs (consensus TKNNGYAAK) (62), but most of these (the two initiation response elements and the two proximal C/EBP sites) map within the 5Ј-UTRs of the gene transcripts. Potential Sp1 sites (63) are also present. The apparent paucity of "legitimate" upstream promoter elements in the ␣a/␣b intergenic region might indicate that the noncanonical, and irregularly located, motifs that we have described are functional. Alternatively, signals present in the first introns of the two genes may function as transcriptional regulators (see "Introns" below). Thus, determination of the actual promoter (and enhancer) elements of this gene pair will require deletion analysis of the ␣a/␣b intergenic region and introns 1 using an appropriate reporter vector and host cell system (see "Discussion"). The neural expression of ␣a, ␣b, and ␣c, for example, may be conferred by enhancer elements located within the first introns of these genes (see below).   (Table I). Consistent with its pattern of expression, the ␣c gene contains promoter elements characteristic of hematopoietic, neural, and testicular genes (63,64). A consensus TATA box begins 450 bp upstream of the start codon, and noncanonical TATA motifs are located at Ϫ119 and Ϫ470. 3 Two bona fide CCAAT elements (60) begin at positions Ϫ159 and Ϫ210. Two GATA sites (consensus WGATAR) (64), the targets of GATA-binding transcriptional activators in subsets of blood, neural, and testicular cells (64 -66), are found at position Ϫ411 and downstream of the proximal TATA box. One CACCC element (64) is located upstream of the distal CCAAT box, and a c-Myb (consensus ATTGAC) (63) site is present downstream of the proximal TATA box. Other sites that may contribute to expression of ␣c include single Sp1, Hox-1, and octamer motifs (63,67,68).
Introns-The introns of the three N. coriiceps ␣-tubulin genes are noteworthy for their generally small sizes (986 -1149,  83-102, and 102-103 bp for introns 1, 2, and 3, respectively) and their uniformly high contents of A ϩ T residues (62-71%; 3 Numbering of the upstream sequences of ␣c is relative to its start codon because the transcription initiation site of this gene has not yet been mapped.
FIG. 4. Primary sequences of the ␣a-, ␣b-, and ␣c-tubulins. Amino acid residues that differ between the three polypeptides are shown by shaded rectangles. Residues of the ␣band ␣c-tubulins that are identical to ␣a are indicated by periods. The sequence of ␣b-tubulin beyond Glu 168 was deduced from the cognate cDNA, NcTb␣2, of the ␣b gene.

TABLE I
Sequence comparison of regions of the N. coriiceps ␣a-, ␣b-, and ␣c-tubulin genes Percentage similarities were calculated as the similarity index of Dayhoff (50). For the alignments, the K-tuple was set at 3, the gap penalty at 1, and the range at 40. a Calculated for the coding sequence, introns, 100 nucleotides of sequence upstream of the translation start codon, and the 3Ј-UTR to and including the polyadenylation signal sequence. The ␣b sequence downstream of codon 168 was completed from its cognate cDNA, NcTb␣2.
b Calculated jointly because transcription start sites were not mapped. The 479-bp region between the start codons of the ␣a and ␣b genes was divided into two equal fragments followed by alignment of the sense strands (cf Fig. 6A). The ␣a and ␣b gene segments were also compared to a 240-bp segment upstream of the ␣c start codon.
c Calculated for sequence commencing after the stop codon and terminating after the probable polyadenylation signal.
length-weighted mean ϭ 65.5%). Corresponding introns in human and frog ␣-tubulin genes (accession numbers X01703 and X07045, respectively) are considerably larger (intron 1 ϭ 1527-3499 bp, intron 2 ϭ 147-1024 bp, intron 3 ϭ 183-303 bp), and their A ϩ T contents range from 36 to 73% (weighted mean ϭ 59.5). In contrast to their introns, the coding sequences of the N. coriiceps ␣a, ␣b, and ␣c genes are relatively A ϩ T-poor (45,43, and 45% A ϩ T, respectively), due in part to their biased usage of codons (see above). The intervening sequences of the ␣a, ␣b, and ␣c genes, considered separately, are also more divergent than their coding sequences (Table I). Fig. 7 shows the exon-intron boundaries of the three ␣-tubulin genes. The donor exon triplets located immediately to the 5Ј sides of the splice sites are unusual: ATG for the first junction, T(C/T)G for the second, and CTG for the third versus the vertebrate consensus (C/A)AG (69) and the tubulin consensus ATG (54). Similarly, the 5Ј nucleotide of the downstream acceptor exon rarely matches the vertebrate consensus residue, G. By contrast, intron sequences adjacent to the donor and acceptor junctions conform well to the vertebrate consensus. In particular, each intron contains the triplet GT(A/G) at its 5Ј end and a pyrimidine-rich tract immediately upstream of the CAG triplet at its 3Ј end (see also Fig. 3, A-C).
The absence of definitive basal promoter elements in the compact ␣a/␣b intergenic region raises the possibility that each gene might be governed by promoter elements located in the first intron of the other. Two perfect inverse (i.e. reverse complementary) TATA elements reside in intron 1 of ␣a, the first 239 bp from its 5Ј end, or 721 bp upstream of the start codon for the ␣b gene (Fig. 3A). Similarly, a near-perfect inverse TATA box is located 392 bp from the 5Ј end of intron 1 of ␣b, or 874 bp before the start codon for the ␣a gene (Fig. 3B).
A striking feature of tubulin gene expression in Drosophila is the occurrence of cis-acting regulatory sequences (often enhancers) in the first intron of several of the ␣and ␤-tubulin genes that confer tissue-specific expression (70 -73). We scanned the intronic sequences of the three ␣-tubulin genes of the N. coriiceps cluster for comparable elements and found two, the neural-specific enhancer CAAAAT and the maternal-specific enhancer CAAAAAT originally defined for the ␤1-tubulin gene of Drosophila (70) . Fig. 3, A-C, shows that the ␣a gene contains three copies of the neural element and two of the maternal, ␣b one and zero, respectively, and ␣c four and two, FIG. 5. Expression of the NcGTb␣a, NcGTb␣b, and NcGTb␣c tubulin genes in tissues of N. coriiceps. Steady-state levels of mRNAs transcribed from the ␣a (ALA), ␣b (ALB), and ␣c (ALC) genes were assessed in seven tissues (right axis) by hybridization of slot blots of total RNA preparations (5 g per tissue) to the gene-specific 3Ј-UTR probes shown in Fig. 3. Probes were generated from cDNAs corresponding to the genes (see "Experimental Procedures") by PCR. For comparison, expression of a fourth ␣-tubulin gene (using the 3Ј-UTR of the NcTb␣3 cDNA) that is not part of this cluster was also evaluated. Total ␣-tubulin mRNA in each tissue was revealed by hybridization to a coding fragment of the NcTb␣1 cDNA (encoding amino acid residues 1-430).
FIG. 6. 5-Noncoding sequences upstream of the NcGTb␣a, NcGTb␣b, and NcGTb␣c tubulin genes. A, palindromic nature of the 479-bp intergenic region linking the start codons of the ␣a and ␣b genes. The 2-fold rotational axis is indicated by the vertical line and attached arrows, and palindromic sequences are shown by light and dark shading. 5Ј-UTRs, deduced from the NcTb␣7 and NcTb␣2 cDNAs that correspond to the ␣a and ␣b genes, are indicated by underlining. (Due to the high degree of similarity of the palindromic 5Ј sequences, it has proven impossible to design gene-specific oligonucleotides for precise mapping of the transcription start (capping) sites by primer extension (85) or by S1-nuclease protection (86).) Because transcriptional start sites have not been mapped, sequences are numbered with respect to the translational start codons (ϩ1; noncoding nucleotides begin at Ϫ1) of the two genes. B, potential promoter and enhancer elements of the ␣a/␣b intergenic region. TATA, C/EBP, initiation response element, and Sp1 elements are shown in reversed, boldface, boxed, and underlined text, respectively. C, promoter and enhancer elements within the 5Ј-noncoding sequences of the ␣c gene. Potential TATA, CCAAT, GATA, CACCC, Sp1, c-myb, Hox-1, and octamer motifs are shown in reversed, boldface, shadowed, double-underlined, single-underlined, bold underlined italic, bold italic, and underlined italic text, respectively. respectively. These observations support our hypothesis that cis-acting sequences within the first introns of the clustered ␣-tubulin genes (particularly in the ␣a and ␣b genes) may contribute to regulation of their expression.
3Ј-Coding and -Untranslated Sequences-Comparison of the carboxyl-terminal coding sequences and the 3Ј-UTRs of the ␣a, ␣b, and ␣c genes reveals a remarkable degree of similarity (Table I and Fig. 8). The ␣a and ␣b UTRs are the most closely related, with 88% identity in the first 90 bp, and an overall similarity of 78%. However, the ␣a 3Ј-UTR is considerably shorter (127 bp) than are those of ␣b and ␣c (270 and 309 bp, respectively).
A Repetitive Sequence Element in the Notothenioid Genome-During characterization of the N. coriiceps ␣-tubulin gene complex, we also scanned the ϳ2-kb intergenic region located between the ␣a and ␣c genes against the GenBank TM data base to determine whether it shared significant sequence features with other genes. We found that a 285-bp fragment (Figs. 1 and 9) of the ␣a/␣c intergenic region is ϳ90% similar to a bipartite element from intron 4 of the trypsinogen gene (accession number U58835) (74) of the Antarctic toothfish, Dissotichus mawsoni. No other significant matches were detected. The lone match to the Dissotichus intronic fragment is striking and raises the possibility that this shared sequence might constitute a repetitive element of notothenioid fishes. To determine the abundance and species distribution of this fragment, we hybridized it to Southern replicas of a panel of HindIII-digested genomic DNAs from Antarctic and temperate notothenioids, other temperate fishes, an amphibian, and a reptile (Table II). Among the notothenioid fishes, ϳ40 -50 discrete bands were detected against a smeared background of positive DNA fragments, consistent with the partially structured dispersal of many copies of this element throughout their genomes (data not shown). This pattern is reminiscent of the distribution of two short interspersed nuclear elements in a subgroup of sal-monid fishes (75). By contrast, the 285-bp fragment did not hybridize at all to the genomic DNAs of non-notothenioid fishes and more distantly related vertebrates. Given the apparent restriction of this repetitive element to the notothenioid suborder, we provisionally designate it Noto1. We are currently investigating the possibility that Noto1 is a mobile genetic element.
Evolutionary Divergence Times for the ␣-Tubulin Genes-Using as a metric the nuclear gene divergence rate (0.12-0.33%/million years) recently determined for the nonfunctional globin gene remnants of Antarctic icefishes (43), we can estimate the time of ␣-tubulin gene duplication. We considered substitutions at positions of 4-fold degeneracy in the coding sequences, which minimizes the influence of selection on molecular differences (76). Furthermore, transversions were analyzed because they accumulate linearly with respect to time (77). Taken pairwise, the 2.3-3.7% transversion frequency observed for the NcGTb␣a, NcGTb␣b, and NcGTb␣c genes at 4-fold degenerate codons yields an estimated divergence time of ϳ7-31 million years. Thus, the cluster apparently evolved as the Southern Ocean cooled (1,2). DISCUSSION In this report we describe the first example of a vertebrate tubulin gene cluster, a complex of three tightly linked ␣-tubulin genes from the Antarctic yellowbelly rockcod, N. coriiceps. The ␣a, ␣b, and ␣c genes probably evolved by duplication, inversion, and divergence of an ancestral gene during the period when the Southern Ocean was cooling. We propose that cold adaptation of microtubule assembly in Antarctic fishes entails both the expression of numerically large ␣and ␤-tubulin gene families and the unique sequence features of the encoded tubulin polypeptides.
Evolution of a Vertebrate ␣-Tubulin Gene Cluster by Gene Duplication-The striking similarity of the three ␣-tubulin genes that comprise the N. coriiceps cluster (97-98% coding sequence similarity, 80 -83% overall similarity), and the clearly palindromic structure of the ␣a and ␣b genes, suggests that they evolved relatively recently from a common ancestral gene. Given the apparently large number of ␣-tubulin genes possessed by this fish, the identity of the ancestral gene is unclear. Nevertheless, we consider it likely that ␣a is the direct ancestor of ␣b (or vice versa) and gave rise to the latter gene through a recent duplication/inversion event that preserved neural-specific expression. Subsequent conversion (78) of the segment of the ␣a gene containing introns 2 and 3 to that of ␣c (or of the corresponding region of the ␣b gene to that of a fourth ␣-tubulin gene) would explain the regional similarities and dissimilarities within the cluster. Determination of the most plausible evolutionary scenario that explains the origin of the entire cluster will depend on analysis of other members of the ␣-tubulin gene family of N. coriiceps.
It is intriguing to speculate that other ␣-tubulin genes may be linked to the ␣a-␣c cluster, upstream of ␣c and/or downstream of ␣b, in orientations that create additional divergent transcription units. We plan to evaluate these possibilities by analysis of genomic clones that overlap S2 and by PCR-based linkage studies.
Adaptational Expansion of Tubulin Gene Templates-Based on the divergence rate (0.12-0.33%/million years) recently determined for the nonfunctional nuclear globin gene remnants of Antarctic icefishes (43), we estimate that the N. coriiceps ␣-tubulin gene cluster arose ϳ7-31 mya. Thus, duplication and divergence of members of the ␣-tubulin gene family apparently occurred in concert with, and probably was an adaptive change selected by, cooling of the Southern Ocean, which began ϳ38 mya and reached freezing temperatures during the mid-late Miocene (5-14 mya) (79). This conclusion must be qualified by recognition that gene conversion events within the ␣-tubulin cluster may have reduced the sequence heterogeneity of the individual genes (80), which would lead to underestimation of the true divergence time. However, it is noteworthy that the antifreeze glycoprotein genes of notothenioid fishes apparently evolved from a pre-existing pancreatic trypsinogen gene in a time frame similar to that which we have estimated for ␣-tubulin gene divergence (74).
We interpret the duplication of the ␣-tubulin genes as a molecular adaptation that amplifies the genomic templates for transcription of tubulin mRNAs, thus ensuring that RNA polymerase II, whose activity is likely depressed at the low body temperatures of Antarctic fish, can generate transcripts in quantities sufficient to support synthesis of ␣-tubulin polypeptides. To test this hypothesis rigorously, it will be necessary to examine the status of ␣-tubulin genes in a temperate nototheniid fish, such as the congeneric New Zealand black cod N. angustata. The black cod appears to have diverged from coldadapted notothenioids and re-evolved a temperate phenotype as the waters around New Zealand warmed with the retreat of the Antarctic convergence ϳ5 mya (81,82). With the relaxation of positive selection pressure for maintenance of a large ␣-tubulin gene family, it is likely that N. angustata has lost some of these genes (80), a prediction consistent with our preliminary observation that the black cod lacks an ␣-tubulin gene cluster comparable to that of N. coriiceps. Further comparisons of ␣-tubulin gene numbers and organization in cold-adapted and  9. A sequence element shared with the notothenioid trypsinogen gene. BLASTN (National Center for Biotechnology Information) comparison of the ϳ2-kb ␣a/␣c intergenic region of the N. coriiceps gene complex to GenBank TM data base files detected significant sequence homology of a 285-bp fragment (Fig. 1) to the 5Ј and 3Ј ends of a 522-bp fragment from intron 4 of the trypsinogen gene (accession number U58835) (74) of the Antarctic toothfish, D. mawsoni. The sequences were aligned using the Clustal method provided by DNASTAR MegAlign. Regions of sequence identity are indicated by the shaded boxes. Gaps introduced to establish optimal alignment are shown by dashes.
temperate notothenioids is clearly warranted.
Gene Regulatory Elements-The widespread expression of the ␣c gene and the neurally restricted expression of the ␣a/␣b pair indicate that the two groups have evolved distinct cisacting regulatory elements. However, our inability to delineate clear examples of upstream basal (e.g. TATA boxes) and tissuespecific promoter elements, particularly for the ␣a/␣b gene pair, is perplexing. One possible interpretation is that the gene regulatory elements of Antarctic fishes may differ from those of higher vertebrates. We regard this possibility as unlikely because other notothenioid genes, such as the adult ␣-globin and myoglobin genes of N. coriiceps (43,83), possess suites of consensus basal and tissue-specific promoter and enhancer motifs. Rather, we suggest that some promoter and enhancer elements will be found in the first introns of the three ␣-tubulin genes. For example, the first intron of ␣a contains two inverse TATA boxes that may control transcriptional initiation from the ␣b gene, and intron 1 of ␣b contains a near-perfect inverse TATA motif that may regulate ␣a expression. We have also tentatively identified neural-and maternal-specific enhancers (CAAAAT and CAAAAAT, respectively) in the first introns. Originally defined for the Drosophila ␤1-tubulin gene (70), these sequences may function as specific regulators of tubulin expression throughout the metazoa. To test this proposal, we plan to determine the regulatory elements of the ␣a/␣b intergenic region and its associated introns 1 by deletion analysis using a luciferase-expressing reporter vector and PC-12 cells to provide a neural microenvironment.
Biased Nucleotide Compositions of Noncoding Regions-We have reported previously that the introns and UTRs of ␣-globin genes from N. coriiceps and other cold-living and temperate vertebrate ectotherms are markedly rich in A ϩ T-rich residues, whereas the corresponding regions of ␣-globin genes from warm-bodied vertebrates are A ϩ T poor (43). Similar results have been reported for the myoglobin genes of Antarctic icefishes (83). The introns and UTRs of the three N. coriiceps ␣-tubulin genes described here share the A ϩ T-rich intron/ UTR profile of the cold-living ectotherms. The preference for AT base pairing may promote DNA replication and transcription at cold temperatures by facilitating strand separation, thereby enhancing access of DNA and RNA polymerases to their templates (43,83). This advantage may be particularly important at the extreme and chronically low temperatures experienced by Antarctic fishes.
Repetitive Elements in the Notothenioid Genome-We have identified within the ␣-tubulin gene cluster a 285-bp DNA sequence, Noto1, that is strikingly similar to part of intron 4 of the notothenioid trypsinogen gene (74). Southern analysis indicates that Noto1 is a repetitive element unique to notothenioid fishes and characterized by partially structured genomic dispersal. Although its nature remains undefined, this repetitive element shares no sequence similarity with the pericentric/ telomeric satellite DNA element pIF (84) of notothenioid fishes. Thus, Noto1 constitutes a second DNA repeat that may prove useful in establishing the phylogenetic relationships of the notothenioid suborder.