Structure of the human laminin alpha2-chain gene (LAMA2), which is affected in congenital muscular dystrophy.

We have determined the structure and complete exon size pattern of the human laminin α2-chain gene (LAMA2), which has been shown to be affected in congenital muscular dystrophy (Helbling-Leclerc, A., Zhang, X., Topaloglu, H., Cruaud, C., Tesson, F., Weissenbach, J., Tomé, F. M. S., Schwartz, K., Fardeau, M., Tryggvason, K., and Guicheney, P. (1995) Nat. Genet. 11, 216-218). The gene is over 260,000 base pairs and contains 64 exons. The sequence of all exon-intron borders was determined. Two of the exons, i.e. exons 43 and 52, are extremely small in size, 6 and 12 base pairs, respectively. Comparison of the exon pattern of the human LAMA2 gene with that of the Drosophila LAMA gene revealed that only 2 of 63 intron locations in the 5′-end of the human gene match the intron locations in the Drosophila gene, which contains 14 introns.

Laminins are a family of large trimeric basement membrane glycoproteins composed of ␣-, ␤-, and ␥-chains (1)(2)(3). The three subunit chains associate at their carboxyl termini in a coiled coil, usually forming a cross-shaped molecule with the long arm contributed to by the coiled coil and the amino termini forming the short arms. To date, five genetically distinct ␣-chains, three ␤-chains, and two ␥-chains have been identified (1)(2)(3)(4)(5)(6)(7). The complete sequence of all of the human chains except for ␣5 has been determined (6, 8 -16). The laminin chains form a variety of isoforms that may vary extensively with respect to their tissue distribution. There is still little knowledge about the regulation of tissue-specific expression of laminin genes, and the structure of the human genes has only been determined for the ␤1-chain (17), ␥1-chain (18), and ␥2-chain (19). These studies have revealed considerable structural divergence between the ␤and ␥-chain genes. As yet, the structure of no mammalian ␣-chain gene has been reported, but the Drosophila ␣-chain gene has been shown to contain 15 exons (20).
The human laminin genes are quite dispersed in the genome, but many of them are located in proximity to each other, which, in turn, indicates their evolutionary relationship. Thus, LA-MA1 and LAMA3, encoding the ␣1and ␣3-chains, respectively, are located on the same chromosome 18, but distantly from each other at 18p11.3 and 18q11.2, respectively (11,21). LAMA2 and LAMA4 are located close to each other at 6q22-23 and 6q21, respectively (5, 10). The ␤-chain genes are all located on different chromosomes, with LAMB1 at 7q22 (12), LAMB2 at 3p21 (13,22), and LAMB3 at 1q32 (23). The two ␥-chain genes, LAMC1 and LAMC2, are located in very close proximity to each other, both genes being located at 1q25-31 (16,24).
The laminin-2 and laminin-4 isoforms (4), which have the molecular formulas ␣2:␤1:␥1 and ␣2:␤2:␥1, respectively, are characteristically enriched in basement membranes surrounding skeletal muscle fibers (25,26). The high tissue specificity of these isoforms is provided only by the ␣2-chain, previously termed merosin (8), as the other component chains of this laminin isoform, the ␤1or ␤2and ␥1-chains, are quite ubiquitous. The ␣2-chain is expressed widely in basement membranes of skeletal muscle, both at neuromuscular synapses and extrasynaptically as well as in the myotendinous junctions (27)(28)(29). The skeletal muscle-specific location of the ␣2-chain implies a specific function for muscle development and function, and this chain has been shown to bind to the sarcolemma protein complex dystroglycan, which, in turn, interacts with the cytoskeleton component dystrophin and extracellular laminin (30,31). Mutations in the genes for dystrophin have previously been shown to cause Duchenne's muscular dystrophy (32), and mutations have also been identified in components of the dystroglycan protein complex in other types of muscular dystrophy (33)(34)(35)(36)(37). Recently, mutations were described also in the laminin ␣2-chain gene (LAMA2) in muscular dystrophy in mice (38), and more recently, we identified mutations in human patients with congenital muscular dystrophy (39,40). These findings demonstrate the role of the ␣2-chain in skeletal muscle function.
Detailed characterization of the LAMA2 gene is essential for studies on gene regulation, analysis of mutations, and studies on the pathogenesis of congenital muscular dystrophy. In this work, we have determined the entire exon pattern of the human LAMA2 gene and shown it to exceed 260,000 base pairs in size and to contain 64 exons.

Isolation of Genomic Clones and Gene
Mapping-Three human -phage libraries (CLONTECH HU2004j, HL1111j, and HL1067j) were screened to isolate genomic LAMA2 clones. The libraries were screened with 32 P-labeled human laminin ␣2-chain cDNA inserts (10) according to standard procedures (41). Purified clones were characterized by restriction mapping and hybridization with different laminin ␣2-chain cDNA fragments or sequence-specific oligonucleotide probes, and appropriate restriction fragments were subcloned into the pBluescript SK vector for further analysis.
DNA Sequencing and Estimation of Intron Sizes-Sequencing of exons and exon-intron boundaries was carried out on purified -phage clones directly or, in some cases, on subcloned restriction fragments of genomic clones by using an AmpliCycle kit (Perkin-Elmer) and a Cycle Sequencing kit (Pharmacia Biotech Inc.). Sizes of introns were assessed by determination of the size of electrophoresed fragments polymerase chain reaction-amplified from genomic DNA or by Southern hybridization and mapping of restriction fragments. * This work was supported in part by grants from the Swedish Medical Research Council, the Academy of Finland, and the Sigrid Juselius Foundation, Finland. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) U66733-U66796.

RESULTS
Characterization of Genomic Clones-We isolated and further characterized a total of 32 genomic clones from the -phage libraries using our previously isolated cDNA clones (11). These clones, which contain all exons of the gene, were aligned and mapped using restriction enzymes (Fig. 1). The overlapping clones spanned 260,000 base pairs of genomic DNA and the entire gene, with the exception of parts of introns 2, 3, 4, and 21 not contained in the clones.

FIG. 2-continued
Assignment of Exons to EcoRI Restriction Fragments-Since the human LAMA2 gene has been shown to be mutated in patients with congenital muscular dystrophy (39,40), it is useful to know the location of exons in restriction fragments of the gene when analyzing break points of deletions or other rearrangements of the gene in patients. To facilitate such analyses, we have assigned all exons to EcoRI restriction fragments in the gene (Table II). DISCUSSION This work, describing the entire exon sequence pattern of the human LAMA2 gene, provides the first structure of a mammalian laminin ␣-chain gene. The presence of 64 exons in this gene, which encodes a 9500-nucleotide transcript and the 3110residue ␣2-polypeptide chain, shows that the gene is considerably larger and more complex than the human genes coding for the smaller ␤and ␥-chains. Thus, the LAMB1 gene of the 1786-residue ␤1-chain (12) has 34 exons (17); the LAMC1 gene of the 1609-residue ␥1-chain (15) has 28 exons (18); and the LAMC2 gene of the 1172-residue ␥2-chain (16) has 23 exons (19). In contrast, the LAMA gene of the 3712-residue ␣-chain of Drosophila is considerably more compact, containing only 15 exons (20). Our previous work (17,18) has indicated that the structures of the genes for the ␤and ␥-type laminin chains have diverged extensively. For example, the exon size pattern of the LAMB1 gene differs significantly from that of LAMC1 and LAMC2, with the latter two showing considerable similarity in gene structure between themselves. The present results show that the ␣-chain genes have diverged extensively from the ␤and ␥-chain genes since the LAMA2 gene exon size pattern is very different from that of the LAMB1, LAMC1, and LAMC2 genes. It is likely, however, that the different ␣-chain genes exhibit considerable structural homology between themselves, as do the two human ␥-chain genes (18,19).
It was of interest to compare the structural relationship of the Ͼ260,000-base pair multiexon human LAMA2 gene with that of the more compact 14,000-base pair Drosophila gene. This comparison (Fig. 3) revealed that the location of intervening sequences in the two genes is poorly conserved as only two intron locations, i.e. introns 3 and 6, in the human gene match introns 2 and 3 in the Drosophila gene. This shows that the ␣-chain genes in higher species have become more complex by splitting of the coding sequences due to the uptake of new noncoding sequences.
The exact size of the human LAMA2 gene was not established from the -phage clones studied as they did not span the entire introns 2, 3, 4, and 24. However, the overlapping clones covering the rest of the gene spanned 260,000 base pairs, demonstrating that the gene is large. This is not uncommon for genes of the structural components of basement membranes, which, in general, have been shown to be large. For example, the complete size of the COL4A6 gene for the type IV collagen ␣6-chain has been shown to be 425,000 base pairs and to contain 46 exons (42). In that gene, intron 2 was estimated to be 340,000 base pairs. The human LAMC2 gene is 55,000 base pairs with 23 exons (19). The complete sizes of other type IV collagen or laminin genes have not been elucidated because of the presence of large introns not contained in the clones isolated. However, the sizes of such genes whose entire exon pattern has been determined are all large. Thus, the LAMB1 gene is Ͼ80,000 base pairs (17); LAMC1 is Ͼ60,000 base pairs (18); COL4A1 is Ͼ100,000 base pairs (43); and COL4A5 is ϳ250,000 base pairs (44,45).
The human laminin ␣2-chain gene has been shown to be mutated in patients with congenital muscular dystrophy. All three mutations reported to date are single base changes leading to splice site, nonsense, or missense mutations (39,40). The present work provides the basis for detailed analysis of the gene in the disease. In the case of large gene rearrangements, the assignment of exons to specific EcoRI restriction fragments facilitates the analysis of break points. However, as has turned out to be the case for most genetic diseases, the majority of mutations are small gene changes such as deletions or insertions or single base changes within or adjacent to exons. Therefore, determination of the sequences of exon-intron boundaries is essential for mutational analysis.
In conclusion, this study has provided the first exon-intron structure of a mammalian laminin ␣-chain gene. This work has direct practical applications as the gene has been shown to be mutated in congenital muscular dystrophy; and therefore, this work provides the basis for mutational analysis and even future gene therapy. Also, knowledge about gene structure is a prerequisite for studies on the regulation of gene expression.