Structural organization of the human and mouse laminin beta2 chain genes, and alternative splicing at the 5' end of the human transcript.

We have determined the structural organization of the human and mouse genes that encode the laminin β2 chain (s-laminin), an essential component of the basement membranes of the neuromuscular synapse and the kidney glomerulus. The human and mouse genes have a nearly identical exon-intron organization and are the smallest laminin chain genes characterized to date, due to the unusually small size of their introns. The laminin β2 chain genes of both species consist of 33 exons that span ≤12 kilobase pairs of genomic DNA. The exon-intron pattern of the laminin β2 chain gene is also highly similar to that of the human genes encoding the homologous laminin β1 and β3 chains. The putative promoter regions of the human and mouse laminin β2 chain genes have features characteristic of the promoters of genes that have a limited tissue expression. Considerable conservation of the intron sequences of the mouse and human genes was observed. The first intron of the human gene, located 1 base pair upstream of the translation start codon, contains a non-consensus 5′ splice site. This intron was shown to be inefficiently spliced in humans, suggesting that post-transcriptional mechanisms may be involved in the regulation of laminin β2 chain gene expression.

We have determined the structural organization of the human and mouse genes that encode the laminin ␤2 chain (s-laminin), an essential component of the basement membranes of the neuromuscular synapse and the kidney glomerulus. The human and mouse genes have a nearly identical exon-intron organization and are the smallest laminin chain genes characterized to date, due to the unusually small size of their introns. The laminin ␤2 chain genes of both species consist of 33 exons that span 12 kilobase pairs of genomic DNA. The exon-intron pattern of the laminin ␤2 chain gene is also highly similar to that of the human genes encoding the homologous laminin ␤1 and ␤3 chains. The putative promoter regions of the human and mouse laminin ␤2 chain genes have features characteristic of the promoters of genes that have a limited tissue expression. Considerable conservation of the intron sequences of the mouse and human genes was observed. The first intron of the human gene, located 1 base pair upstream of the translation start codon, contains a non-consensus 5 splice site. This intron was shown to be inefficiently spliced in humans, suggesting that post-transcriptional mechanisms may be involved in the regulation of laminin ␤2 chain gene expression.
Basement membranes in different tissues perform a variety of functions, including the filtration of macromolecules and the anchorage of epithelial cells to the underlying stroma (1). They also provide signals for the migration and differentiation of cells during development and are necessary for the maintenance of normal tissue function in the adult (2). Heterogeneity in basement membrane structure and function may be mediated in large part by tissue-specific variations in the expression pattern of the subunits of the two main components, laminin and type IV collagen. Laminin is the major non-collagenous component of basement membranes and is able to polymerize into a network that is cross-linked with type IV collagen by the single-chain protein entactin/nidogen (3). The laminin molecule is a heterotrimer of three non-identical chains, called the ␣, ␤, and ␥ chains according to the new nomenclature (4). The three chains are members of a multigene family, and so far five ␣, three ␤, and two ␥ chains have been identified in vertebrates (4 -6). All laminin chains are multidomain glycoproteins that contain a 570 -600-amino acid ␣-helical region that forms a triple-stranded coiled-coil with the other two chains (reviewed in Ref. 7). The N-terminal regions of the chains are heterogeneous, consisting of alternating globular and cysteine-rich domains of varying size and number. The ␣ chains contain an additional C-terminal globular domain of approximately 100 kDa. The laminin chains can assemble into at least seven different laminin isoforms (4) that have distinct tissue distributions (8 -9).
The three members of the ␤ chain family are characterized by the presence of a 32-34-amino acid ␣ domain that interrupts the ␣-helical region (10). The ␤1 (formerly B1) chain is widely expressed, while the homologous ␤2 (formerly s-laminin) and ␤3 (formerly B1k) chains have a more restricted expression pattern. The mature ␤1 and ␤2 chains consist of 1765 and 1766 amino acids, respectively, that are 50% identical in amino acid sequence and have a similar domain structure (10 -14), while the 1155-amino acid ␤3 chain is 36% identical to the other two and is truncated in the N-terminal portion (15)(16). The ␤3 chain assembles with the ␣3 and ␥2 chains to form laminin-5 or kalinin, a component of the basement membranes of stratified squamous epithelia (see Ref. 17, and references therein). Mutations in the laminin ␤3 chain gene have been identified in some individuals with the lethal skin disease Herlitz junctional epidermolysis bullosa (18). The ␤1 and ␤2 chains can substitute for one another in complexes containing the ␥1 and either the ␣1, ␣2, or ␣3 chains (19 -21), or an uncharacterized ␣ chain present in the kidney (22). Despite their similarities, the ␤1 and ␤2 chains do not appear to be functionally redundant. The two chains have a different and sometimes mutually exclusive tissue distribution, with the ␤2 chain enriched in the basement membranes of the skeletal muscle neuromuscular junction, kidney glomerulus, nerve fascicle perineurium, and vascular smooth muscle (8,12,13). During development, the ␤1 chain is expressed early in the glomerular, perineural, and arterial smooth muscle basement membranes, but at later stages is replaced by the ␤2 chain (23)(24)(25). Direct evidence of a distinct role for the ␤2 chain was obtained in transgenic mice in which the ␤2 chain gene was inactivated by homologous recombination. The mice, which died within 1 month after birth, displayed defects in the architecture and function of neuromuscu-  (26) and developed massive proteinuria due to impaired filtration by the glomerular basement membrane (27).
The role of the laminin ␤2 chain in neuromuscular function has been the object of considerable interest. The ␤2 chain was originally identified by a monoclonal antibody that specifically stained the neuromuscular synapse (12), and during development it has been detected in regions where neural morphogenesis and migration occur (28,29). The rat laminin ␤2 chain was found to promote the selective adhesion of motor neurons, and this activity was mapped to the sequence leucine-arginineglutamine in the C terminus (30,31). In addition to the neuromuscular phenotype observed in the ␤2 chain knock-out mice, recent studies implicate alterations in the levels of the ␤2 chain in the pathogenesis of two forms of inherited muscular dystrophy in humans. The ␤2 chain was increased and the ␤1 chain reduced in the skeletal muscle basement membranes of patients with autosomal recessive muscular dystrophy associated with adhalin deficiency (32). We have shown that the laminin ␤2 chain and adhalin are deficient in the skeletal muscle basement membrane of two individuals with the autosomal recessive disease Walker-Warburg syndrome (congenital muscular dystrophy accompanied by brain and eye malformations; Ref. 33).
Currently little is known about the mechanisms responsible for the tissue-and developmental stage-specific regulation of the laminin ␤2 chain gene. To provide a foundation for the identification of elements involved in the control of laminin ␤2 chain gene expression, we have isolated and characterized genomic clones that encode the entire human and mouse ␤2 chain genes. We have determined the structural organizations of the genes, compared the sequences of their putative promoter regions and introns, and demonstrated that the human gene displays an unexpected heterogeneity in the 5Ј-untranslated region due to alternative splicing of the first intron.

Isolation and Characterization of Laminin ␤2 Chain Genomic
Clones-A human genomic DNA library in FIX (Stratagene) was screened by plaque hybridization (34) to a human ␤2 laminin cDNA clone (35) labeled with [␣-32 P]dCTP using a random primed labeling kit (Amersham Corp.). Two positive phage, 1 and 2, were plaque-purified, and large scale phage DNA was prepared (34). Restriction fragments were subcloned into plasmid vectors (pGEM-3Z and -4Z, Promega; or pBluescript SK(ϩ), Stratagene) for further analysis by restriction enzyme mapping and sequencing. The human genomic DNA sequence was obtained using Sequenase enzyme and reagents (Amersham) and analyzed using the software programs of the Wisconsin Package, version 8, of the Genetics Computer Group (36).
Using full-length rat laminin ␤2 cDNA (12) as a probe, two mouse genomic clones were isolated that together contained all of the coding exons of the gene and 4 kb 1 upstream of the translation start site. One clone, 31F1, isolated from an NIH/3T3 library in FIX (Stratagene) contained exons 1-30. A second clone, 10.8, isolated from a PCC4 library in FIX (Stratagene) contained exons 1-33. Restriction fragments from these clones were subcloned into pBluescript for further analysis.
Primer Extension Mapping of the Transcription Start Site of the Human Gene-Total RNA was extracted from the human Clone A colon carcinoma cell line according to the method of Chirgwin et al. (37). A 21-nt oligomer (5Ј-dGCGACTTTGAGCAAAGTTGGG), complementary to nt 60 -80 of the cDNA (13), was labeled with [␥-32 P]ATP using T4 polynucleotide kinase. Forty micrograms of Clone A total RNA was annealed to 5 ng of the labeled primer for 2 h at 50°C according to the method of Boorstein and Craig (38). The annealed primer was extended with 25 units of avian myeloblastosis virus reverse transcriptase (Boehringer Mannheim) in a 50-l reaction mixture containing 50 mM Tris-HCl (pH 8.5 at 20°C), 8 mM MgCl 2 , 30 mM KCl, 6 mM dithiothreitol, 50 g of bovine serum albumin, 0.5 mM each dNTP, and 40 units of RNasin (Promega). After hydrolysis of the RNA and ethanol precipitation, the product was dissolved in formamide-containing buffer and analyzed on a 7 M urea, 6% polyacrylamide sequencing gel. For determination of the size of the product, the adjacent lanes contained sequencing reactions performed using the same 32 P-labeled primer on a 2.3-kb plasmid subclone of 2 that contained exons 1-3 and 1.6 kb of the 5Ј-flanking DNA. The sequencing gel was exposed to x-ray film at Ϫ70°C with intensifying screens.
RNA Analysis by Reverse Transcriptase and PCR-Total RNA was extracted from various human and rat tissues and cell lines as described above. Cytoplasmic RNA was prepared by lysing Clone A colon carcinoma cells in buffer consisting of 10 mM KPO 4 (pH 7.9), 4 mM MgCl 2 , 4 mM EGTA, 0.5 mM dithiothreitol, 0.1% monothioglycerol, 0.25 M sucrose. After homogenization with 10 strokes of a Dounce homogenizer (pestle A) and addition of Nonidet P-40 to a final concentration of 0.4%, the nuclei were pelleted by centrifugation at 1000 ϫ g for 10 min at 4°C, and RNA was extracted from the supernatant by a modification of the method of Chomczynski and Sacchi (39). Synthesis of cDNA was carried out as described previously (35), and amplification was performed with Ampli-Taq polymerase and buffer from Perkin-Elmer. RT-PCR analysis of the human laminin ␤2 chain transcript employed the sense primer 5Ј-dGAGGGAAATAGGCCAAAG (nt 139 -156 of the cDNA sequence) and either antisense primer 1 (5Ј-dTGACTGACGATGCAG-TAGGGC, complementary to nt 387-407 of the cDNA) or antisense primer 2 (5Ј-dGCACTTCTTTTCGTCCTGCA, complementary to nt 410 -429 of the cDNA). RT-PCR analysis of the rat transcript was performed using the sense primer 5Ј-dAGGGAAACCAGCCCAGTACC (nt 42-61 of the rat cDNA; Ref. 12) and antisense primer 5Ј-dTCGG-GAGTCACACAGGAAG (complementary to nt 341-359 of the rat cDNA). Thirty-five cycles were performed (denaturation, 95°C for 45 s; annealing, 55°C for 60 s; and extension, 72°C for 60 s), followed by a final extension period of 5 min at 72°C. The PCR products were analyzed on a 1% agarose gel. For sequence analysis, the bands were gel-purified and subcloned into pCR-Script (Stratagene) according to the manufacturer's specifications.

Genomic Organization of the Human and Mouse Laminin ␤2
Chain Genes-The entire human laminin ␤2 chain gene was obtained in two overlapping genomic clones that spanned a total of 17 kb of DNA and extended 2 kb upstream of the translation start site and 3 kb downstream of the poly(A) addition site (Fig. 1A). Two mouse clones were isolated that covered 20 kb of genomic DNA and contained the complete mouse laminin ␤2 chain gene, 4 kb upstream of the ATG codon, and 6 kb downstream of the TGA codon (Fig. 1B). The exonintron patterns of the human and mouse genes were determined by sequencing of all the exons and introns except for the largest intron, intron 16. The two genes were found to have a nearly identical structural organization, summarized in Figs. 2 and 3. Both genes consisted of 33 exons that varied in size from 64 bp (exon 7) to 373 bp (exon 25). Excluding the first and last exons, whose precise sizes were not determined in the mouse, the sizes of only three exons differed between the human and mouse genes. The second exon of the mouse gene was 9 bp larger than that of the human gene, since the signal peptide of the mouse ␤2 chain polypeptide was 3 amino acids longer than that of the human. The sizes of exons 14 and 15 varied by 1 bp between the two species, since the position of intron 14 of the mouse gene was shifted 1 bp to the left, occurring after the second nt of the codon for Gly 545 instead of after the third.
The exons of the human gene extended over approximately 12 kb of genomic DNA, which is slightly more than twice the sum of the exon lengths (5721 bp). For the mouse gene, the distance from the ATG translation start codon to the TGA termination codon was approximately 10 kb. The compact sizes of the human and mouse laminin ␤2 chain genes were due to the unusually small lengths of the introns; each gene possessed only one intron larger than 500 bp, and 19 of the 32 introns were less than 100 bp. The sizes of the corresponding introns of the human and mouse genes were generally quite similar, except that intron 16 of the human gene was roughly 3 times the size of the mouse equivalent and intron 30 of the mouse gene was more than twice the size of its human counterpart. The sequences of all the splice junctions of the two genes obeyed the GT/AG rule except for the first intron of the human gene, which began with the sequence GC as the 5Ј splice donor site (see below).
Mapping of the 5Ј End of the First Exon of the Human Laminin ␤2 Chain Gene-The transcription start site of the human laminin ␤2 chain gene was determined by primer extension. First strand cDNA synthesis was performed using a radiolabeled oligonucleotide complementary to nt 60 -80 of the cDNA sequence (nt 108 -128 in Fig. 6) and total RNA from the Clone A human colon carcinoma cell line, which had been shown to express relatively high levels of laminin ␤2 chain mRNA by Northern blotting (35). Gel analysis of the primer extension products yielded a single band of 128 bp (Fig. 4), indicating that the first exon was 212 bp in size and that the gene possessed one major transcription start site.
Coding Features of the Exons of the Mouse and Human Laminin ␤2 Chain Genes-The first exons of the human and mouse laminin ␤2 chain genes encoded only the 5Ј-untranslated regions, and the ATG translation start codons were located in exon 2, 1 bp downstream from the junction of the first intron. The human gene encodes a 1798-amino acid protein that includes a 32-residue signal peptide, while the mouse gene encodes an 1801-amino acid polypeptide with a 35-residue signal peptide. The derived amino acid sequence of the mouse gene is 95% identical to that of the rat (12) and 87% to that of the human (data not shown). The protein coding features of the exons were identical for both the mouse and human genes (Fig.  5A). Exons 2 and 3 encode the signal peptide, and exons 3-8 encode the globular domain VI, which is believed to be involved in the self-assembly of laminin monomers into a network (3). The two cysteine-rich repeat domains were encoded by exons 8 -14 (domain V) and exons 19 -25 (domain III). The two cysteine-rich domains are separated by a globular domain of unknown function, domain IV, that was encoded by exons 14 -18. The two ␣-helical regions, domain II and domain I, were encoded by exons 25-27 and by exons 28 -33, respectively. The first 102 bp of exon 28 also encoded the 34-amino acid ␣ domain that separates the two ␣-helical domains. The translation stop codon was found in exon 33. The final exon of the human gene also contained the 111-bp 3Ј-untranslated region and the polyadenylation site. Downstream of the poly(A) addition site the human genomic DNA contained the sequence TGTGTTGT, which is similar to a proposed consensus sequence, YGU-GUUYY, for cleavage and polyadenylation of mRNA (40).
Conservation of the Structural Organization of the Laminin ␤1, ␤2, and ␤3 Chain Genes-Comparison of the locations of the introns of the human laminin ␤2 chain gene with those of the human genes encoding the homologous ␤1 (41) and ␤3 chains (16) shows that the structural organization of the three human ␤ chain genes has been highly conserved (Fig. 5A). The positions of the first two introns in each gene are non-equivalent; the first introns of the ␤1 and ␤3 chain genes are found 86 and 37 bp, respectively, upstream of the translation start codons, and the second introns interrupt the signal peptides in different positions (data not shown). However, when the se- quences of the mature polypeptides of the three chains were aligned to optimize the sequence identity, the positions of all the introns were identical (Fig. 5B), except that the ␤2 chain gene was found to lack the intron corresponding to intron 28 of the ␤1 chain gene and intron 17 of the ␤3 chain gene, resulting in the fusion of two exons into one in this region of the ␤2 chain gene. The sizes of the homologous exons of the three genes were not always identical, due to insertions or deletions of a few amino acids. The phases of the homologous introns were also conserved in the three chains, with the exception that the glycine codon at residue 1295 of the laminin ␤1 chain is interrupted by a phase 2 intron, while the equivalent glycine codon in the ␤2 and ␤3 chains is split by a phase 1 intron. Many of the boundaries between the protein domains of the three chains do not correspond with the location of introns, as was noted previously for the human laminin ␤1 and ␤3 chain genes.
Sequence Analysis of the 5Ј Ends of the Human and Mouse Laminin ␤2 Chain Genes-The sequences of approximately 1.1 kb upstream of the transcription start site of the human gene and of approximately 0.9 kb of the corresponding region of the mouse gene were determined (Fig. 6). Both genes lacked a canonical TATA box upstream of the predicted transcription start site, but an AT-rich element (AATAAA) was found 40 bp upstream of the transcription start site of the human gene and was also present in the homologous region of the mouse sequence. Neither gene contained a CCAAT box or a "CTC" box (CCCTCCC), an element found in the regulatory regions of several extracellular matrix protein genes (42,43). The promoter region and the first exon of the human gene were GCrich (57% and 63% GC, respectively), and the sequence from nt Ϫ452 to Ϫ760 was characteristic of a CpG island (see Ref. 44, and references therein), as it had a 64% GϩC content and an observed/expected CpG ratio of 0.67. The sequence of the mouse 5Ј-flanking DNA was less GC-rich (51%) than the human, but it did contain a CpG island between nucleotides 56 and 438. The relatively short CpG islands present in the 5Ј ends of the human and mouse ␤2 chain genes are characteristic of genes with a limited tissue expression pattern (44). The 5Ј end of the human gene contained several potential binding sites for transcription factors that recognize GC-rich elements. Two SP1 binding sites (CCGCCC) and six potential binding sites for transcription factor AP-2 (45) were found in the 5Ј-flanking DNA and the first exon. The stretches of GC residues in 5Ј end of the human gene could possibly bind other transcription factors that recognize GC-rich sequences, such as ETF and GCF (46). The 5Ј-flanking DNA of the human gene contained nine copies of the sequence CCCCCA and two copies of the The size of the first intron was determined by primer extension (Fig. 4). The sizes of introns 1-15 and 17-33 were determined precisely by sequencing the intron DNA. The size of intron 16 was estimated by restriction enzyme mapping. Phase 1, 2, and 0 introns occur after the first, second, and third nucleotides of a codon, respectively. The amino acids are numbered starting with the first residue after the predicted signal peptide cleavage site. complementary sequence TGGGGG; four of these elements overlapped with putative AP-2 binding sites, and their functional significance is not known. The human gene also contained an Alu repetitive element that terminated at Ϫ1047 and was followed by a tract of 28 A residues. The sequence of the putative promoter region of the mouse gene was only 39% identical to that of the human and lacked the potential SP1 and AP-2 binding sites and the CCCCCA motifs.
Comparison of the Intron Sequences of the Mouse and Human Genes-Since the putative promoter regions of the mouse and human genes showed relatively little sequence similarity, we compared the sequences of the introns of the two genes, except for that of the largest, which was not completely sequenced in either species. The sequence identity varied from 45% for intron 6 to 82% for intron 27, with an average of 62% identity (Fig. 7A). A more detailed comparison of one of the highly conserved introns, intron 7, is shown in Fig. 7B. While the overall sequence homology of the mouse and human seventh intron is 74%, there is a stretch of 114 nt in which the mouse and human sequences are 89% identical. Within this conserved region, both genes contain a sequence (AGGTCT-NNNNAGGTGA) that is similar to a thyroid hormone response element (AGGTCANNNNAGGTCA; Ref. 48) and two E-boxes (CANNTG), which are recognition sites for helix-loop-helix proteins (49). Each gene also contains an additional E-box in intron 7, downstream of the highly conserved region. The helixloop-helix family of transcription factors includes regulators of muscle-specific gene expression such as MyoD (49).
Alternative Splicing in the 5Ј-Noncoding Region of the Human Laminin ␤2 Chain Gene-Since the first intron of the human ␤2 chain gene began with a non-consensus 5Ј splice site, we investigated whether it was efficiently spliced out. When RT-PCR was performed using primers that flank the first intron, two bands were detected when total RNA from several different human tissues and cell lines was analyzed (Fig. 8A). The bands are derived from reverse transcribed RNA rather than genomic DNA contamination because PCR amplification of genomic DNA would yield a larger product due to the presence of intron 2. The identity of the PCR products was confirmed by cloning and sequencing the bands (data not shown). The sizes of the PCR products are consistent with the existence of transcripts that contain the 86-bp first intron (377 bp) and of transcripts in which the intron has been spliced out (291 bp). The intensity of the bands indicated that the unspliced form is more abundant than the spliced form in the cell types assayed. Both transcripts were detected in cytoplasmic as well as total RNA (Fig. 8B), indicating that the unspliced form is transported to the cytoplasm. We also investigated the splicing pattern of the 89-bp first intron of the rat laminin ␤2 chain gene, which has a consensus 5Ј GT splice site like that of the mouse. 2 A single band of 318 bp, derived from the spliced transcript, was observed by RT-PCR analysis of RNA from several rat tissues (Fig. 8C), indicating that splicing of the first intron of the rat gene was complete. In contrast to the rodent gene, the human laminin ␤2 chain gene thus has the potential to encode transcripts that contain either a 213-or a 299-bp 5Ј-untranslated region. Both forms of the 5Ј leader have a 62-63% GC content and the ability to form several stem-loop structures, as indicated by computer analysis of the RNA sequences. The structural predictions also indicated that inclusion of the 86-bp first intron would dramatically alter the secondary structure surrounding the AUG translation start codon (Fig. 8D). As the translational efficiency of an mRNA is believed to be primarily due to the length and conformation of its 5Ј leader (51), the two forms of the 5Ј-noncoding region of human laminin ␤2 chain mRNA could thus differ in their ability to promote translation of the polypeptide. DISCUSSION We have characterized the human and mouse genes that encode the laminin ␤2 chain, both of which were found to consist of 33 exons that occupied only 12 kb or less of genomic DNA. The mouse gene (Lamb2) has been mapped to the distal region of chromosome 9 (52), and the human gene (LAMB2) was assigned to chromosome 3p21 (13,14), a region of conserved synteny with mouse chromosome 9. Analysis of the structure and sequence of the human and mouse genes has shed light on the evolution of the ␤ subfamily of laminin chain genes and revealed features that may be involved in the transcriptional and post-transcriptional regulation of laminin ␤2 chain gene expression.
The laminin ␤2 chain gene showed a high degree of structural homology to the human genes encoding the other two known members of the laminin ␤ chain family, the ␤1 chain gene that spans Ͼ80 kb on chromosome 7 (11,41) and the ␤3 chain gene of 29 kb located on chromosome 1 (16,53). The three ␤ chains genes were apparently derived by duplication of a common primordial ␤ chain gene, with the subsequent deletion of exons encoding domain IV and several flanking cysteine repeats in the ␤3 chain gene. The ␤2 chain gene also lost one intron in the 3Ј end of the gene, which must have occurred prior to the divergence of rodents and primates. Intron loss has been postulated to occur via homologous recombination between a gene and the product of its reverse transcribed RNA (54). The duplication event that yielded the ␤1 and ␤2 chain genes may have involved a large segment of the DNA, as at least two genes linked to LAMB2 at human chromosome 3p21, GNAI2 and MST1/HGFL, have paralogues (GNAI1 and HGF) that map in the vicinity of the LAMB1 locus at 7q22 (55). In Drosophila only one ␤ chain has been identified so far (56), which is equally homologous to the ␤1 and ␤2 chains of mammals (12). The 2 F. Loechel, unpublished observations.  5. Comparison of the location of introns in the genes encoding the human laminin ␤2, ␤1, and ␤3 chains. A, at the top is a schematic representation of the features encoded by the human laminin ␤2 chain cDNA. Solid lines represent the 5Ј-and 3Ј-untranslated regions, increased complexity of basement membranes in higher organisms evidently requires the existence of a family of ␤ chains whose members have evolved to perform specialized functions.
Following duplication, the laminin ␤1 and ␤2 chain genes must have acquired different regulatory elements in order to produce their distinctive tissue distribution patterns. The human and mouse ␤2 chain genes contained an AT-rich sequence upstream of the predicted transcription start site that could serve as a TATA box. In contrast, the promoter regions of the more widely expressed human (41) and mouse (57) laminin ␤1 chain genes lacked TATA box-like elements, as do the promoters of many housekeeping genes. Both the human ␤1 and ␤2 chain genes possessed potential bindings sites for SP1 and AP-2; however, the human ␤1 chain gene has six SP1 sites clustered close to the transcription start site, which could result in a higher level of basal promoter activity. The significance of the AP-2 sites in the promoter regions of the two genes is not clear, since expression of AP-2 in mouse embryos was found to be greatest in ectodermal tissues (58). The laminin ␤2 chain is co-expressed with the ␣5(IV) collagen chain in some tissues (23), but aside from the presence of SP1 and AP-2 sites, no obvious similarities were found between the 5Ј ends of the human ␤2 chain gene and the human ␣5(IV) collagen chain gene (59).
Comparison of the sequences of the putative promoter regions of the mouse and human genes ought to aid in the identification of elements involved in transcriptional regulation, as these would be expected to be conserved in the two species. The genomic DNA within 900 bp upstream of the predicted transcription start sites of the two genes, however, displayed little sequence homology. It therefore appears likely that the elements primarily responsible for transcriptional control of the laminin ␤2 chain gene are located further upstream of the transcription start site, or further downstream, in the and boxes symbolize the protein domains. A solid box indicates the signal peptide; domains VI, IV, II, and I are designated by shaded boxes; and domains V, III, and ␣ are indicated by open boxes. The individual cysteine-rich repeats in domains V and III are boxed and numbered. Below this, the exons of the laminin ␤2 chain gene have been aligned with the corresponding region of the cDNA, and dotted lines connect exon-intron junctions that coincide with the boundaries of the polypeptide domains. Also shown are diagrams of the exons of the human laminin ␤1 (41) and ␤3 (16) chain genes. The 5Ј and 3Ј ends of the ␤3 chain gene have been aligned with the homologous regions of the ␤2 and ␤1 chain genes, and a gap was introduced in exon 12. The 5Ј end of exon 12 is homologous to exon 12 of the ␤1 and ␤2 genes, while the 3Ј end is homologous to exon 23 of the ␤1 and ␤2 genes. Asterisks were used to mark the introns in the ␤1 and ␤3 chain genes that do not occur in homologous positions of the ␤2 gene. The sequences at the 5Ј ends of the human (h) and mouse (m) genes (starting 1096 bp upstream of the predicted transcription start site of the human gene and extending to the first 71 bp of exon 3) were aligned using the GAP program. The human sequence is numbered relative to the transcription start site as determined by primer extension (ϩ1), and the mouse sequence is numbered consecutively. The exon sequences are in capital letters, and the noncoding sequences are in lowercase letters. The sequence from Ϫ1096 to Ϫ1047 of the human gene is part of an Alu repetitive element (47). The mouse genomic DNA sequences that were considered to be part of the first exon were inferred by homology to the human sequence and have not been verified experimentally. Vertical lines connect nucleotides that are identical in both species, and dots represent gaps introduced to maximize the sequence identity. The two SP1 binding sites (CCGCCC, at Ϫ851 and ϩ49) and the six transcription factor AP-2 binding sites (45) in the human gene are underlined. The putative AP-2 sites at Ϫ863, Ϫ655, and Ϫ379 are on the top strand, and those at Ϫ674, Ϫ58, and ϩ33 are found on the complementary strand. The derived amino acid sequences, starting at the ATG initiation codon in the second exon of both genes, are indicated by the one-letter amino acid code and are placed above the nucleotide sequence for the human gene and below the nucleotide sequence for the mouse gene. The amino acid sequences are numbered relative to the predicted signal peptide cleavage sites, indicated by downward arrows.
introns or in the 3Ј-flanking DNA. Considerable evidence exists that some introns contain sequences that regulate transcription. Introns were found to increase the expression of several transgenes by 10 -100-fold in mice (60), and enhancers have been identified in the introns of many genes, including the first intron of the ␣1(IV) collagen chain gene (61,62). Conservation of intron sequences in the same gene from different species has been observed, such as in the genes encoding the human and mouse ␣1(II) collagen chains (63). The high degree of sequence identity of some of the introns of the mouse and human laminin ␤2 chain genes argues that they may contain regulatory information, and potential transcription factor binding sites were identified in at least one intron of the mouse and human laminin ␤2 chain genes. Further studies will be needed to define the specific regulatory sequences within the introns of the laminin ␤2 chain gene.
In addition to harboring binding sites for transcriptional control factors, it is also possible that the introns of the laminin ␤2 genes could modulate gene expression at the level of RNA processing. The unusually short introns of the laminin ␤2 chain gene endow it with the distinction of being the smallest laminin chain gene yet characterized. The intron-exon ratio of the human laminin ␤2 chain gene is approximately 1.1:1, while the ratio of average intron to average exon size was found to be around 8:1 (64). The smallest introns of the human and mouse ␤2 chain genes are close to the lower limit of vertebrate intron sizes; few introns less than 75 bp have been found (64). The lower limits on intron size are due to the fact that splicing requires an approximately 50-nt spacer between the 5Ј donor site and the branch point to accommodate the spliceosome complex (see Ref. 65, and references therein). Inefficient removal of some introns may provide control points for the regulation of splicing in a tissue-and developmental stage-specific fashion. We have shown that at least one intron in the human laminin ␤2 chain gene is inefficiently spliced: the first intron, which has a non-consensus GC splice site. Out of several thousand intron sequences analyzed, very few contain non-consensus 5Ј splice sites (66). Single GC introns have been found in at , and rat embryo limbs (lane 4) was performed using rat-specific primers. An arrow indicates the single PCR product of 318 bp, which corresponds to transcripts in which the first intron has been spliced out. The X174/HaeIII size markers are in lane 5. D, predicted secondary structures at the 5Ј ends of the two alternatively spliced forms of the human laminin ␤2 chain mRNA. The FoldRNA program (50) was used to calculate the optimal RNA secondary structures of the sequence of exons 1 and 2 (Ϫ) or of exon 1, intron 1, and exon 2 (ϩ). The Squiggles program was used to make graphical representations of the optimal secondary structures. The positions of the translation start codons (AUG) in the structural plots are indicated. The arrows show the junctions of the sequence of the first intron in (ϩ). The 5Ј and 3Ј ends of the folded sequences are indicated.
least two other human genes encoding extracellular matrix proteins, the ␣1(IV) collagen (67) and ␣1(VII) collagen (68) chains. No information is available on whether efficient splicing of these introns takes place. As these non-consensus introns occur in the protein coding region of the gene, failure to remove the introns would produce alterations in the polypeptide chain.
Alternative splicing of the first intron of the human laminin ␤2 chain gene yielded transcripts with two 5Ј leaders that differ significantly in their predicted secondary structure at the translation initiation site. The intron sequence does not introduce any upstream AUG codons that could interfere with translation of the polypeptide. The unspliced transcript is present in the cytoplasm and is thus not an inactive precursor that is sequestered in the nucleus. The translational efficiency of the two transcripts of the human laminin ␤2 chain gene can be assayed by determining whether both forms are associated with the translationally active membrane-bound polysomal fraction (69). The rodent laminin ␤2 chain genes apparently do not undergo alternative splicing in the 5Ј-untranslated region and may be under a different form of translational control than the human. The rat and mouse laminin ␤2 chain transcripts also differ from the human in having a C residue at position Ϫ3 relative to translation start codon; 97% of vertebrate mRNAs have a purine in this position (70). The rodent laminin ␤2 chain mRNAs may thus be inefficiently translated, which appears to be a common feature of mRNAs that encode proteins involved in regulating cell growth and differentiation (51).
The means by which the synthesis of the various laminin chains is temporally and spatially regulated are largely unknown. The present study is the first to implicate control of laminin gene expression at the translational level via alternative splicing of the 5Ј-untranslated region of the mRNA. It will be of interest to determine how splicing and translation of the laminin ␤2 chain mRNA are involved in the switch from the ␤1 to the ␤2 chain that occurs in the basement membranes of several tissues during development (23)(24)(25). The existence of post-transcriptional regulatory mechanisms may explain previous observations that some cell lines contain detectable levels of laminin ␤2 chain mRNA but appear to incorporate little of the protein into the extracellular matrix (35,71). Following synthesis of the polypeptide there are no doubt further levels of control, such as the assembly of the ␤2 chain into complexes with the various ␣ and ␥ chains, the transport of ␤2 chaincontaining laminin isoforms to the cell surface, and their interaction with other extracellular matrix components. The incorporation of the laminin ␤2 chain into the basement membrane at the appropriate time and place during morphogenesis may be precisely controlled by a variety of cellular factors. Detailed knowledge of the structure of the laminin ␤2 gene will facilitate attempts to elucidate these factors.