Mouse alpha 1- and beta 2-syntrophin gene structure, chromosome localization, and homology with a discs large domain.

The syntrophin family of dystrophin-associated proteins consists of three isoforms, α1, β1, and β2, each encoded by a distinct gene. We have cloned and characterized the mouse α1- and β2-syntrophin genes. The mouse α1-syntrophin gene (>24 kilobases) is comprised of eight exons. The mouse β2-syntrophin gene (>33 kilobases) contains seven exons, all of which have homologues at the corresponding position in the α1-syntrophin gene. Primer extension analysis reveals two transcription initiation sites in the α1-syntrophin gene and a single site in the β2-syntrophin gene. The sequence immediately 5′ of the transcription start sites of both genes lacks a TATA box but is GC-rich and has multiple putative SP1 binding sites. The α1-syntrophin gene is located on human chromosome 20 and mouse chromosome 2, while the β2-syntrophin gene is on human chromosome 16 and mouse chromosome 8. Analysis of the amino acid sequence of the syntrophins reveals the presence of four conserved domains. The carboxyl-terminal 56 amino acids are highly conserved and constitute a syntrophin unique domain. Two pleckstrin homology domains are located at the amino-terminal end of the protein. The first pleckstrin homology domain is interrupted by a domain homologous to repeated sequences originally found in the Drosophila discs-large protein.

Syntrophin is a peripheral membrane protein of M r ϳ58,000 that was first identified in the postsynaptic membrane of Torpedo electric organ and subsequently shown to be present in many mammalian tissues (1). Interest in syntrophin came first from its location at the neuromuscular junction and more recently from the demonstration that it is directly associated with dystrophin, the product of the Duchenne/Becker muscular dystrophy gene. Although the precise function of syntrophin is unknown, a potential role for the dystrophin-associated proteins in agrin-stimulated nicotinic acetylcholine receptor clustering has implicated syntrophin in the process of synaptogenesis (2).
The function of syntrophin is likely to be related to its association with dystrophin and other members of the dystrophin protein family (9 -13). Proteins of the dystrophin family are derived from a combination of three genes, the use of alternative promoters within these genes, and alternative splicing. Dystrophin, the major product in skeletal muscle, is a 427-kDa protein with an actin-binding amino terminus, 24 spectrin-like coiled coil repeats, a cysteine rich (CR) 1 region and a unique carboxyl terminus (CT) (14,15). Shorter forms of dystrophin that contain CRCT and either no (Dp 71) or only a few spectrin repeats (e.g. Dp 116) are produced from the same gene via internal promoters (for review, see Ref. 54). Utrophin is highly related to dystrophin but is encoded by a separate gene (16,17). A third gene encodes a protein of M r 87,000 from Torpedo that has modest but significant homology with the CRCT of dystrophin (18). The association of syntrophin with each of these members of the dystrophin family suggests that it has a general role in their functions at the membrane (9).
Most members of the dystrophin protein family exhibit highly restricted tissue distributions. The exception is utrophin, which, as its name implies, is nearly ubiquitous. Dystrophin is expressed primarily in skeletal muscle but can also be detected in cardiac muscle and brain. Dp 71 is found primarily in brain glial cells, liver, and stomach (19 -21). Dp 116 is found only in glial cells, particularly the Schwann cells (22). Other recently described proteins derived from alternate promoters within the dystrophin gene also appear to have restricted tissue distributions (23,24).
The three syntrophins also show unique expression patterns. Northern blot analysis has shown that ␣1-syntrophin, like dys-trophin, is expressed primarily in skeletal muscle but is also present in cardiac muscle, kidney, and brain (5). In skeletal muscle, the subcellular distribution of ␣1-syntrophin is virtually identical to that of dystrophin. Both proteins are associated with the sarcolemma and concentrated at the neuromuscular junction (25). ␤2-Syntrophin message is expressed at highest levels in testis, brain, cardiac muscle, kidney, and lung (5), but only at low levels in skeletal muscle where the protein is restricted to the neuromuscular junction (25). ␤1-Syntrophin message is found primarily in liver with moderate levels in kidney, skeletal muscle, and lung but at very low levels in brain and cardiac muscle (8). These unique patterns of syntrophin are most likely a result of tissue-specific transcription. The predominant interactions between syntrophins and dystrophin family members may reflect these differential tissue expressions and thus subserve the specific functional requirements of that tissue.
As a prelude to understanding the transcriptional control of syntrophin, we report the cloning and characterization of the genes encoding two of the three syntrophins. We have mapped the loci for ␣1-syntrophin (gene symbol Snta1) and ␤2-syntrophin (gene symbol Sntb2) to the mouse and human chromosomes in order to identify potential links between mutations in these genes and known genetic disorders. Finally, analysis of the full-length amino acid sequence of the syntrophins has revealed the presence of two pleckstrin homology (PH) domains (26), a PDZ domain (so named to indicate its presence in postsynaptic density protein-95 (PSD-95), the Drosophila discs large tumor suppresser protein, and the zonula occludens-1 protein (ZO-1), 2 and a highly conserved carboxyl-terminal syntrophin unique (SU) domain.

EXPERIMENTAL PROCEDURES
Isolation of Genomic Clones-Genomic clones encoding ␣1and ␤2syntrophin were isolated from a 129SV mouse genomic DNA library in Fix II (Stratagene, La Jolla, CA) by screening 10 6 phage with each of the respective cDNAs as described previously (5). Twelve ␣1-syntrophin clones and 17 ␤2-syntrophin clones were isolated. Based on restriction mapping and hybridization to cDNA fragments, four ␣1and five ␤2syntrophin clones were selected for further analysis. Restriction fragments of the genomic clones that hybridized to cDNA sequence were subcloned into pUC 19 for sequencing. None of the ␤2-syntrophin clones contained exon 5, and repeated screening proved unsuccessful in isolating this part of the gene. Therefore, this exon and flanking sequence was obtained by amplification of mouse genomic DNA using the Gene-Amp XL PCR kit (Perkin Elmer) and primer sets that spanned intron 4 (5Ј-AGCAGTGGAGGCCTGTTCTCATGG; 5Ј-GACTTCCTTGATCAGC-TCAGCAGC) and intron 5 (5Ј-GGTTCATTCAGGTTCTGGATGTCGG; 5Ј-TCCATCATCAGCAGACATCTTCAGC). Intron lengths were determined by a combination of sequencing, restriction mapping of the genomic clones, and PCR analysis of mouse genomic DNA.
DNA Sequencing and Analysis-DNA sequencing was performed manually (27) and by the University of North Carolina, Chapel Hill, Automated DNA Sequencing Facility on a model 373A DNA sequencer (Applied Biosystems). Sequence assembly and analysis was performed using DNA Star Lasergene computer software. The BLAST network service through the National Center for Biotechnology Information was used for databank searches.
Chromosome Localization-Genomic DNAs from C57BL/6J, Mus spretus, and a (M. Spretus ϫ C57BL/6J) F1 ϫ M. spretus back-cross DNA panel consisting of 94 DNAs were obtained from The Jackson Laboratory (Bar Harbor, ME) (30). Southern blots and hybridizations were carried out as described previously (31). Approximately 5 g of genomic DNAs from the C57BL/6J and M. spretus progenitors were digested with 28 different restriction enzymes to find a suitable restriction fragment length polymorphism (RFLP) for mapping. Southern blots were probed with either mouse ␣1-syntrophin or ␤2-syntrophin cDNA (5). Approximately 2 g of DNA from the back-cross panel was digested for each sample with HincII overnight. Segregation of Snta1 and Sntb2 alleles was compared with other loci from a data base at The Jackson Laboratory Backcross DNA map Panel Service (30). The gene symbols Snta1 and Sntb2 have been approved by the International Mouse Nomenclature Committee.
To identify the human chromosomes encoding syntrophin, a hamster-human somatic cell hybrid panel (BIOS, New Haven, CT) was screened with the ␣1and ␤2-syntrophin cDNAs according to a Southern blot protocol described previously (5).

Exon/Intron Structure of Mouse ␣1and ␤2-Syntrophin
Genes-In order to analyze the syntrophin gene structure and identify putative transcriptional regulatory elements, genomic clones encoding ␣1and ␤2-syntrophin were isolated from a 129SV mouse phage library. The sequence of each exon and adjacent sequence was determined using sequencing primers derived from the cDNA. The relative positions and sizes of the exons and introns for each gene are shown in Fig. 1.
The ␣1-syntrophin gene is over 24 kb in length and contains eight exons. The smallest is exon 5 (131 bp) and the longest is exon 8 (580 bp). The length of exon 1 is either 367 or 408 bp depending on the site used for transcription initiation (see below). The length of exon 6 is dependent on the 3Ј splice site used by intron 5. One of the cDNA clones previously isolated, BC10, encoded an additional 4 amino acids (SSAH) not present in three other independently isolated clones (5). The 12 nucleotides encoding this sequence could be included in exon 6 by using an alternative 3Ј splice site (Fig. 2). This splice occurs at a 3Ј TG instead of the highly conserved AG dinucleotide and is therefore likely to be a rare splicing event. The additional 12 nucleotides were observed in only one of four mouse cDNA clones and are not present in the Torpedo (5), rabbit (6), or human (7) ␣1-syntrophin cDNAs, further suggesting that this is a rare splicing event. Exon 8 contains the TAG stop codon followed by 490 bp of 3Ј-untranslated sequence, which includes the polyadenylation signal sequence.
The ␤2-syntrophin gene is over 33 kb long and contains seven exons. The smallest is exon 4 with 143 bp, and the largest, exon 7, is over 1690 bp. The exact size of exon 7 is unknown because no polyadenylation signal sequence was found in either the genomic clone or the 3Ј end of the cDNA (5). The positions of the introns relative to the amino acid sequence and the exon/intron border sequence of both syntrophin genes are shown in Fig. 2. All introns have the conserved GT and AG dinucleotides present at the donor and acceptor sites, respectively.
The exon sequences of the ␣1-syntrophin and ␤2-syntrophin genes were identical to the sequences of the previously characterized cDNAs (5). The cDNA for mouse ␤2-syntrophin previously reported (5) was missing a substantial portion of the 5Ј sequence. Genomic clones containing ␤2-syntrophin exon 1 ex-tended the cDNA sequence by 327 bp, including 264 bp of coding region (88 amino acids). We have also isolated a ␤2syntrophin cDNA clone with sequence corresponding to that of exon 1. The cDNA sequence contains the exon 1 coding sequence, but its 5Ј untranslated region is 15 bp shorter than that of the genomic sequence (data not shown). The full-length amino acid sequence of ␤2-syntrophin derived from the exon 1 sequence coupled with the previously reported cDNA sequence (5) is shown in Fig. 3. The additional amino acids derived from the genomic sequence are underlined. The first 5Ј methionine codon in frame with the previously reported ␤2-syntrophin sequence is in a context favorable for translation initiation (32). The ␤2-syntrophin encoded by this full-length sequence is 520 amino acids long with a calculated molecular mass of 56,388.5 Da. The full-length mouse ␤2-syntrophin amino acid sequence shares 46% identity with mouse ␣1-syntrophin and 55% identity with human ␤1-syntrophin. ␤2-Syntrophin was classified as a basic (␤) form because the partial sequence had indicated a high isoelectric point. The calculated pI of 8.7 for the fulllength protein indicates that it is correctly placed among the ␤-syntrophins.
Transcription Initiation Sites-The transcription initiation sites of both genes were determined by primer extension and confirmed by ribonuclease protection assays. RNA isolated from skeletal muscle and testis (the richest sources of ␣1and ␤2-syntrophin message, respectively) were used for these studies. Ribonuclease protection assays showed two bands using an ␣1-syntrophin probe and a single band using a ␤2syntrophin probe (data not shown). Using primer extension assays, two initiation sites were identified in the ␣1-syntrophin gene at nucleotide positions Ϫ76 and Ϫ117 relative to the translation initiator ATG codon (Fig. 4, left panel). The primer extension band at position Ϫ117 is more intense than that at Ϫ76 and is therefore likely to be the major site of initiation. A single site was identified in the ␤2-syntrophin gene at position Ϫ65 (Fig. 4, right panel). A similar experiment using RNA isolated from heart produced a band at position Ϫ86, although brain and kidney RNA showed only the Ϫ65 band (data not shown).
Sequence of the Promoter Region-The sequence 5Ј of the ␣1and ␤2-syntrophin coding region is shown in Fig. 5. Neither gene contains a standard TATA box, but both have multiple GGGCGG elements (putative SP1 binding sites). The ␣1-syntrophin promoter contains two CANNTG E-box motifs at positions Ϫ240 and Ϫ659. E-box elements in many other promoters bind members of the basic helix-loop-helix family of regulatory factors, often regulating the expression of muscle-specific genes (33). Other putative transcription regulatory elements in the ␣1-syntrophin gene include a GGAA core purine-rich Ets-1 site at position Ϫ262 (34), a CCAAT box at position Ϫ377, and an 8-nucleotide inverted repeat starting at nucleotides Ϫ625 and Ϫ595. The ␤2-syntrophin gene contains an Ets-1 site at position Ϫ183, two CCAAT boxes at nucleotides Ϫ407 and Ϫ539, a Snta1 (encoding ␣1-syntrophin) was identified by the presence of 13.5-and 5.4-kb genomic DNA fragments in C57BL/6J or the absence of these fragments in M. spretus (Fig. 6A). This allele was characterized in 88 DNAs from the (C57BL/6J ϫ M. spretus) F 1 ϫ M. spretus back-cross panel. Haplotype analysis of these mapping data (Fig. 6B) indicated that the Snta1 locus is closely linked to D2Mit22 (DNA fragment, MIT 22) on mouse chromosome 2. Allelic segregation patterns for D2Mit22 and Snta1 were identical (no recombinants), indicating a distance of less than 1 cM between these two genes. The calculated map distances between Snta1 and adjacent loci D2Bir21 (DNA fragment BIR 21) and D2Mit52 (DNA fragment MIT 52), including 95% confidence limits, were determined: D2Bir21-6.8 Ϯ 2.7 cM-Snta1-8.0 Ϯ 2.9 cM-D2Mit52.
A HincII RFLP was also identified for Sntb2 (encoding ␤2syntrophin) by the presence of a 2.2-kb genomic DNA fragment in C57BL/6J or its absence in M. spretus (Fig. 7A). This allele was characterized in the 88 DNAs from the (C57BL/6J ϫ M. spretus) F 1 ϫ M. spretus back-cross panel. Haplotype analysis of these mapping data was performed and is indicated in Fig.  7B. The Sntb2 locus is closely linked to D8Bir25 (DNA fragment BIR 25) on mouse chromosome 8. The calculated map distances between Sntb2 and adjacent loci D8Bir25 and D8Bir26 (DNA fragment BIR 26), including 95% confidence limits were determined: D8Bir25-1.1 Ϯ 1.1 cM-Sntb2-5.7 Ϯ 2.5 cM-D8Bir26.
The human chromosomes containing the ␣1and ␤2-syntrophin genes were identified using hamster-human and mouse/ human hybrid cell lines. Southern blot and PCR analyses indicated that SNTA1 is located on human chromosome 20, and SNTB2 is on human chromosome 16 (data not shown).
Protein Domain Homology-Acquiring the full-length amino acid sequence of ␤2-syntrophin enabled us to compare this sequence to that of other syntrophins and with the sequences in GenBank. These comparisons revealed that all three syntrophins are comprised of four domains (Fig. 8). The presence of two pleckstrin homology domains in the syntrophins and comparison of their sequences with those of other proteins with similar domains has been reported previously (26). In addition, we and Lue et al. (36) have found that a 90-amino acid segment of syntrophin is homologous to the PDZ domain. The alignment of this syntrophin domain with the PDZ domains of PSD-95, human and Drosophila discs large protein, ZO-1, and selected other proteins shows that the syntrophin PDZ domain contains many of the defining elements of this motif, including the conserved GLG(F/I) sequence (Fig. 8A). Finally, the carboxylterminal 56 amino acids of syntrophin are highly conserved across species and isoforms (5,6,8) and thus represent a potential fourth domain. This region has no homology with proteins other than the syntrophins and, therefore, constitutes a SU domain. DISCUSSION The interactions between the syntrophins and dystrophin (and other members of the dystrophin family) is direct (7, 10 -13). One site of interaction occurs at a region of the cysteine-rich domain encoded by exon 74 of the dystrophin gene (11)(12)(13), an exon that is subject to alternative splicing (37). Furthermore, the syntrophins can interact with more than one member of the dystrophin family. For example, ␣1, ␤1, and ␤2 syntrophins each bind homologous sequences of dystrophin, utrophin, and 87-kDa protein (7). Thus, the specificity of the interaction between the syntrophin isoforms and the dystrophin family members that is apparent from the differential localization of ␣1 and ␤2 syntrophins in skeletal muscle (25) must depend on other factors. These may include posttranslational modifications, associations with other proteins, and transcriptional regulation of expression. The tissue distribution patterns of the syntrophin isoforms and members of the dystrophin protein family implicate tissue-specific transcription as a potential regulator of the dystrophin/syntrophin isoform association. As a first step in investigating syntrophin transcriptional regulation, we have isolated and characterized genomic clones encoding ␣1and ␤2-syntrophin.
The ␣1-syntrophin gene is over 24 kb in length and contains seven introns. The ␤2-syntrophin gene is over 33 kb long and contains at least six introns. Comparison of the positions where the introns interrupt the coding sequence shows that all introns in the ␤2-syntrophin gene occur at the corresponding position in the ␣1syntrophin gene (Figs. 1 and 2). The ␣1syntrophin gene contains an additional intron dividing the sequence that in ␤2-syntrophin is a continuous first exon (Fig.  1). This positioning of introns at similar locations is frequently observed in genes derived from a common ancestor.
The ␣1-syntrophin gene has two major transcription initiation sites, 41 nucleotides apart. cDNAs with 5Ј ends near each start site were obtained during cDNA cloning (5). The site at position Ϫ76 is 35 nucleotides 5Ј of the start of cDNA clone BC5, and the site at Ϫ117 extends cDNA clone BC10 by 7 FIG. 4. Primer extension analysis of the transcription initiation sites of the ␣1and ␤2-syntrophin gene. Primer extension of mouse skeletal muscle RNA using a primer to ␣1-syntrophin gives two bands (arrows) separated by 41 nucleotides (lanes 1 and 2). Lanes 1 and 2 represent primer extension products obtained with two different protocols (see "Experimental Procedures"). A potential third band seen in lane 1 was not confirmed in either the second primer extension (lane 2) or in ribonuclease protection assays (data not shown). Extension of mouse testis RNA with a primer to ␤2-syntrophin results in a single major band (lane 3). The marker sequence (M) was derived from M13mp18 and is loaded in the order GATC. The complete sequence of mouse ␤2-syntrophin was derived from the novel amino-terminal 88 amino acids translated from the genomic sequence (underlined) linked to the previously published partial cDNA sequence (5).
nucleotides. Since the two transcripts are only 41 nucleotides different, Northern blots did not resolve the 2 bands, rather a single broad band was observed at ϳ2.4 kb (5). A single transcription start site was identified in the ␤2-syntrophin gene. This suggests that the three sizes of message observed on Northern blots (2, 5, and 10 kb) (5) must result from different sized 3Ј-untranslated region and/or incomplete removal of introns.
The promoter region of both syntrophin genes is very GC rich, contains no identifiable TATA box, and has multiple putative SP1 binding sites. This type of promoter is often present in housekeeping genes, although the Dp 71 gene has a similar promoter (38). Expression of syntrophin mRNAs varies greatly among different tissues, suggesting that tissue-specific regulatory elements are present within the gene. A candidate for such an element is the E-box motif present in both genes. Surprisingly, the ␤2-syntrophin gene, which is expressed in many tissues but only at low levels in muscle, contains a putative CArG box element that is often present in genes encoding muscle-specific proteins. Functional analysis of each promoter will enable identification of the elements responsible for ␣1and ␤2-syntrophin's unique pattern of expression.
To identify potential associations between known genetic disorders and mutations in syntrophin genes, we have mapped the location of the ␣1and ␤2-syntrophin genes (Snta1 and Sntb2) to both human and mouse chromosomes. Human chromosome mapping using somatic cell hybrid indicated that the SNTA1 and SNTB2 loci are located on human chromosomes 20 and 16, respectively. Our data are in agreement with results from the Kunkel laboratory (7,8), which show that the human genes for ␣1, ␤1, and ␤2 syntrophins are located at 20q11, 8q23-34, and 16q23, respectively.
The mouse syntrophin genes were mapped to loci that are part of long conserved linkage groups between human chromosome 20 and mouse chromosome 2 (for ␣1-syntrophin) or human chromosome 16 and mouse chromosome 8 (for ␤2-syntrophin). This information enabled us to compare these locations with those of known mouse neuromuscular genetic disorders. No likely candidates were found within a reasonable distance of the ␣1-syntrophin gene. A potential candidate for a ␤2syntrophin mutation may be the myodystrophic (Myd) mouse (39). Myd harbors the mutation responsible for the myodystrophic phenotype and has been mapped to chromosome 8 at a siteϳ 3 cM from the ␤2-syntrophin gene. Given the error limits of genetic mapping, the Myd locus and the ␤2-syntrophin gene cannot be resolved as separate genes. Experiments are currently underway to determine if ␤2-syntrophin is the gene product altered by the Myd mutation.
Analysis of the amino acid sequence of all three syntrophins shows that syntrophin is comprised of four protein domains. Each syntrophin contains two PH domains, a PDZ domain, and a highly conserved, carboxyl-terminal SU domain. The PH domain is a sequence of approximately 100 residues found in many proteins involved in intracellular signaling (26). Nuclear magnetic resonance and crystal structure analyses have shown that this domain in ␤-spectrin and dynamin consists of seven strands of ␤-sheet and one ␣-helix (40,41). In all three syntrophins, the amino-terminal PH domain is interrupted by the insertion of the PDZ domain in the loop connecting the c and d ␤-sheet strands. Thus, the structure of this PH domain is unlikely to be disrupted by the presence of intervening sequence at this site. Only a few other proteins with PH domains contain insertions at this site (26). The largest of these is the 340-amino acid insertion in the PH domain of phospholipase C␥, which contains two SH2 domains and an SH3 domain. The syntrophin PDZ is very conserved among the three isoforms and across species from Torpedo to human (Fig. 8). The SU region is also highly conserved among the isoforms and across species (5).
The roles these domains play in the function of syntrophin is not understood but will likely be aided by functional studies of these motifs in other proteins. The PH domain has been implicated in ␤-adrenergic receptor kinase association with G proteins (G ␤␥ ) (42), phospholipase C binding to phosphatidylinositol 4,5-bisphosphate (43), ␤ G spectrin binding to membranes (44), and protein kinase C association with Bruton's tyrosine kinase (45). Thus, the presence of two PH domains in syntrophin raises the possibility that this protein, and therefore the dystrophin complex, may mediate intracellular signaling pathways.
The PDZ domain may be involved in targeting syntrophin to the membrane in a manner similar to that proposed for PSD-95 (46). Recently, two proteins containing PDZ domains have been shown to bind erythroid protein 4.1 (36,47). Thus, syntrophin could potentially associate with members of the protein 4.1 family. Since syntrophin is the only protein containing a PH domain or a PDZ domain that is known to bind dystrophin, it is unlikely that either of these domains alone is responsible for the syntrophin-dystrophin association. Rather, it appears that the PH and PDZ domains must work in harmony to bind dystrophin or that the dystrophin binding site is located in the SU domain. The immunoprecipitation of dystrophin family proteins by the carboxyl-terminal two thirds of syntrophin (7) FIG. 8. Alignment of the syntrophin PDZ domain with the PDZ domain of other proteins. A, identical and structurally conserved amino acids are shaded. Proteins included are mouse ␣1and ␤2-syntrophin (5); human ␤1-syntrophin (8); Torpedo ␣1-syntrophin (5); Drosophila discs-large protein (49); human discs-large protein (36); rat postsynaptic density protein (psd95) (46); rat nitric-oxide synthase (50); human lymphocyte chemoattractant factor (GenBank number M90391); human tyrosine phosphatase (51); mouse zonula occludens-1 protein (ZO-1) (52); and human erythrocyte p55 (53). B, structural diagram of syntrophin showing the relative locations of the PH domains, the PDZ domain, and the SU domain.
further supports the SU region as the dystrophin binding domain. Further studies of the syntrophin domains will allow us to identify the binding sites for dystrophin and potentially for other proteins of the dystrophin complex.