Sequence, Structure, and Chromosomal Mapping of the Mouse Lgals6 Gene, Encoding Galectin-6*

we reported that mouse gastrointestinal tract specifically expresses two closely related galectins, galectins-4 and -6, each with two carbohydrate recognition domains in the same peptide. Here, we report the isolation, characterization, and chromosomal mapping of the complete mouse Lgals6 gene, which encodes galectin-6, and of a fragment of a distinct gene, Lgals4 , which encodes galectin-4. The coding sequence of galectin-6 is specified by eight exons. The upstream region contains two putative promoters. Both Lgals6 and the closely related Lgals4 are clustered together about 3.2 centimorgans proximal to the apoE gene on mouse chromosome 7. The syntenic human region is 19q13.1–13.3. Galectins (1, 2) are a family of proteins that have at least one carbohydrate recognition domain (CRD) 1 with conserved sequence elements and affinity for b -galactosides. Although each galectin is abundantly expressed in only a few cell types, the distributions of the best studied galectins, galectin-1 and ga-lectin-3, Genomic somal localization

Galectins (1, 2) are a family of proteins that have at least one carbohydrate recognition domain (CRD) 1 with conserved sequence elements and affinity for ␤-galactosides. Although each galectin is abundantly expressed in only a few cell types, the distributions of the best studied galectins, galectin-1 and galectin-3, encompass a wide range of tissues and change during embryogenesis. In the accompanying paper (3), we have reported a much more restricted expression of two other galectins, galectin-4 and galectin-6, to the gastrointestinal tract both in fetal and adult mice. Galectin-4 and the newly discovered galectin-6 (3) are closely related and belong to a subfamily of galectins with two CRDs within one peptide chain, joined by a link region of variable length (4), which also includes galectin-8 (5, 6) and galectin-9 (7,8). We here report the isolation and structure of Lgals6, the gene encoding galectin-6, and show its relationship to the structure of genes encoding galectins with a single CRD (9 -14), as well as features of the upstream region that may account for the expression of galectin-6 in the gastrointestinal tract. We also demonstrate that the Lgals4 gene encoding galectin-4 is distinct from Lgals6, and that these two genes are very close together on mouse chromosome 7.

EXPERIMENTAL PROCEDURES
Materials and General Methods-Unless otherwise indicated, all nucleic acid enzymes were obtained from Boehringer Mannheim and all chemicals were from Sigma. Nitrocellulose filters were from Schleicher & Schuell, and Magnagraph nylon filters for blotting were purchased from Micron Separations Inc. (Westboro, MA). [␣-32 P]Deoxycytidine 5Ј-triphosphate (3000 Ci/mmol) and [ 35 S]deoxyadenosine 5Ј-(␣-thio)triphosphate (1000 -1500 Ci/ml, sequencing grade) were purchased from NEN Life Science Products. For general molecular biological techniques such as hybridization screening, restriction, gel electrophoresis, blotting, and elution, we followed protocols collected by Maniatis et al. (15).
Oligonucleotides and Polymerase Chain Reactions (PCR)-Oligonucleotides are listed in Table I. For probing of Southern blots, the oligonucleotides were labeled with digoxigenin by 3Ј tailing using digoxigenin-11-dideoxyUTP and terminal deoxynucleotide transferase, and visualized by chemiluminescence after treatment with conjugated antidigoxigenin and using reagents and procedures from Boehringer Mannheim. Hybridization was done at 37°C in hybridization buffer (200 mM Na 2 HPO 4 , pH 7.2, 7% SDS, 1% bovine serum albumin, 15% formamide, 1 mM EDTA), and blots were washed for 10 min at room temperature in 2 ϫ SSC, 1% SDS.
PCR was carried out using Ampli-Taq (Perkin-Elmer). One l of different dilutions of template was mixed with 25 pmol of each primer, buffer (10 mM Tris-HCl, pH 8.3, 50 mM KCl, 1.5 mM MgCl 2 , 0.001% (w/v) gelatin), and 250 M deoxynucleotides. Amplification consisted of 45 cycles: 40 s of denaturation at 96°C for the first five cycles and 94°C for the remaining cycles, 1 min of annealing at 60°C, and 1-4 min of extension at 72°C. Amplified fragments were visualized and purified on 1% agarose gels, stained with ethidium bromide. Excised fragments were electroeluted, phenol-extracted, and precipitated with ethanol.
Isolation of Lgals6 and Subcloning-A mouse genomic DNA (strain 129/SV) library in FIXII (Stratagene, La Jolla, CA) was screened with a cDNA probe containing all the coding sequence but no untranslated sequence of rat galectin-4 (16). The probe was labeled with [␣-32 P]dCTP by random primer polymerization (17) and used in hybridization screening (15) of approximately 1 ϫ 10 6 plaques using Escherichia coli SRB as host. The hybridization was done in hybridization buffer (see above) plus 20% dextran sulfate at 52°C with 2.4 ϫ 10 5 cpm/ml probe. Washes were done at the hybridization temperature, first in 2 ϫ SSC (15), 1% SDS, then in 0.2 ϫ SSC, 0.1% SDS, 30 min each. After drying, the filters were autoradiographed, using X-Omat film (Eastman Kodak Co.) and intensifying screens at Ϫ70°C.
One phage clone, Lgals6, was isolated by plaque purification, and its DNA was purified from high titer liquid culture. The lysate was centrifuged at 6000 ϫ g for 20 min, and the supernatant was treated with 10 g/ml DNase and 20 g/ml RNase, after which the phage were precipitated for 1 h at 4°C with 10% PEG 8000 in 5 mM Tris-HCl, pH 7.5, 0.5 M NaCl, 5 mM MgSO 4 (final concentrations). The pellet was resuspended in 10 mM Tris-HCl, pH 7.5, 10 mM MgSO 4 , and extracted with phenol and chloroform. Finally, the phage DNA was precipitated with isopropanol and resuspended in 10 mM Tris-HCl, pH 7.5, 10 mM NaCl, 1 mM EDTA.
The purified DNA from Lgals6 was digested by XbaI, and the two of the resulting three fragments that hybridized with the rat galectin-4 cDNA probe were subcloned into pBluescript SKϩ (Stratagene, La Jolla, CA), generating clones pLgals6-1 and pLgals6-2. pLgals6-3 containing a DNA fragment spanning the junction between clones pLgals6-1 and pLgals6-2 was isolated by PCR between primers mG6M and rG4F using Lgals6 as template followed by cloning into pCRII (Invitrogen, San Diego, CA).
Sequencing-The different subclones were sequenced using primers synthesized based on rat galectin-4 sequence (16), and later, mouse galectin-6 sequence (see Table I), as well as vector-specific primers. In most cases, we used a modification (10) of the Sanger technique (18) using Sequenase (U. S. Biochemical), as described by the manufacturer. Denatured double-stranded DNA prepared by the method of Kraft et al. (19) was the template. To eliminate artifactual banding caused by presumed secondary structure in intron 7, we used the method described by McCrea et al. (20) employing a terminal deoxynucleotidyltransferase chase after the termination reaction. All exonic regions, intron boundaries, upstream and downstream sequences were verified by sequencing on both strands, except for the 3Ј end of exon 6, which, because of the repetitive DNA in intron 6, was confirmed by sequencing with different primers on the same strand.
Isolation of a Fragment of Lgals4 by PCR-Inbred mouse strain 129/SV genomic DNA (Jackson Laboratories, Bar Harbor, ME) was amplified by means of oligodeoxynucleotides representing sequences distributed throughout the galectin-4 gene. Oligonucleotides mG6F and rG4C gave clear non-cDNA-sized bands on amplification, and therefore a sample of the reaction was ligated into plasmid pCRII (Invitrogen). DNA of selected clones was sequenced using T7 and M13 reverse primers, and the gene-specific primers mG6H and mG6K to obtain sequence on both strands.
Restriction Map-The size of each intron was determined by one of several methods. Introns 1, 4, 5, and 7 were sequenced completely. The size of intron 3 was determined by ApaI restriction digest analysis of clone pLgals6-1. Introns 2 and 6 were sized by PCR amplification between exonic primers surrounding the respective intron to generate fLgals6-1b and pLgals6-2h (Fig. 1). The identity of the PCR products was confirmed by sequencing the ends of each fragment, and the size was determined by gel electrophoresis. Intron sizes aided in the analysis of restriction digest data of both pLgals6-1 and pLgals6-2.
Primer Extension-We used a modified version of the procedure summarized by Ausubel et al. (21). For galectin-6, we used the antisense primer mG6Q (Table I), and as controls we used the antisense primers corresponding to mouse ␤-actin (GenBank™ accession no. X03672; CACATGCCGGAGCCGTTGTCGACGACCAGC) and GAPDH (GenBank™ accession no. M32599; TCTCCACTTTGCCACTGCAAAT-GGCAGCCC). The primers were labeled with [␥-32 P]ATP and polynucleotide kinase and purified by ethanol precipitation in the presence of ammonium acetate as described (15). After resuspension in 100 l of TE, 3.5 l of the labeled primer was combined with 10 l of mouse colon RNA, 1.5 l of hybridization buffer, and heated for 90 min at 65°C and then cooled to room temperature. Buffer, dNTPs, actinomycin D, 1 unit/l RNasin (Promega), and avian myeloblastosis reverse transcriptase (Boehringer Mannheim) were then added to the hybridization mixture and incubated for 1 h at 42°C. After RNase digestion and phenol extraction, the cDNAs were precipitated with ethanol, washed, then resuspended in loading buffer (47.5% formamide, 10 mM EDTA, 0.025% bromphenol blue, 0.025% xylene cyanol FF) and denatured for 5 min at 80°C, before electrophoresis on an 8 M urea 8% polyacrylamide sequencing gel. Molecular weight marker was prepared by digesting x174 DNA (Life Technologies, Inc.) with HinfI and then 5Ј labeling with [␥-32 P]ATP (15).
Genomic Southern Blots and Chromosomal Mapping-The chromosomal localization of Lgals4 and Lgals6 was mapped by restriction Oligonucleotides that were designed based on the mouse galectin-6 gene sequence are named mG6X, and those designed based on the rat galectin-4 sequence are named rG4X. Some oligonucleotides are shown aligned to the galectin cDNA sequences in Fig. 2 of the accompanying paper (3).
b The location of the cognate sequence for each oligonucleotide is indicated as u for upstream, xn for exon n, and in for intron n, and d for downstream.
c Sense oligonucleotides are indicated by a ϩ and antisense oligonucleotides by a Ϫ.
fragment length polymorphism (RFLP) linkage analysis in an interspecific backcross between Mus spretus and C57BL/6J mice ((C57BL/6J ϫ Mus spretus) F1 ϫ C57BL/6J) (22). At first, a Southern blot of genomic DNA from both C57BL/6J and M. spretus digested with several different restriction enzymes (BamHI, BglII, EcoRI, HindIII, MspI, PstI, PvuII, SstI, TaqI, and XbaI) was probed with either the insert from pLgals6-1c ( Fig. 1) specific for Lgals6, or the rat galectin-4 cDNA detecting both Lgals4 and Lgals6. MspI-and HindIII-digested DNA resulted in different sizes of hybridizing bands from the two parental strains (RFLPs) for the Lgals6 probe and galectin-4 cDNA probe, respectively. DNA extracted from 66 progeny of the backcross was cut with MspI or HindIII, electrophoresed, blotted, and hybridized with the appropriate probe. The pattern of M. spretus-specific bands in the 66 progeny was then compared with patterns of parental polymorphic bands observed for other, previously mapped, genes to obtain linkage with other markers.

RESULTS AND DISCUSSION
Cloning and Sequencing of the Gene Encoding Galectin-6 -The clone Lgals6 was isolated by screening a mouse (strain 129/SV) genomic FIX-II library using rat galectin-4 cDNA as a probe, and characterized by restriction mapping, subcloning and sequencing as shown in Figs. 1-3 . The insert was split into two 4.8-kb fragments and one 3.7-kb fragment by XbaI. One of the 4.8-kb fragments and the 3.7-kb fragment were subcloned into pBluescript SKϩ (Stratagene), with resultant colonies (pLgals6-1 and pLgals6-2, respectively) hybridizing to the rat galectin-4 cDNA probe (Fig. 1).
Sequencing the ends revealed that the 4.8-kb insert of pLgals6-1 contained FIX-II sequence (stippled in Fig. 1) and thus came from one end of the Lgals6 insert, whereas the 3.7-kb insert in pLgals6-2 lacked FIX-II sequence and thus came from the middle of the Lgals6 insert (Fig. 1). Moreover, the sequence of a DNA fragment (pLgals6-3) spanning the junction between pLgals6-1 and pLgals6-2 inserts showed that they are joined together and no intervening fragment had been overlooked. Probing of Southern dot blots of pLgals6-1 and pLgals6-2 with oligonucleotides revealed that pLgals6-1 contained the 5Ј end of the gene and pLgals6-2 contained the 3Ј end of the gene.
To sequence the gene, additional subclones were generated from pLgals6-1 and pLgals6-2 as described in Fig. 1, and sequenced with both vector-specific and gene-specific oligonucleotide primers ( Table I). The sequencing "strategy" and restriction map are shown in Fig. 2, and the sequence in Fig. 3. The two characterized subclones pLgals6-1 and pLgals6-2 together contained all the galectin-6 coding sequence (as determined in the accompanying paper (3)) encompassing about 5,500 bp including introns. pLgals6-1 also contained 1,100 bp of upstream sequence and pLgals6-2 contained 1,800 bp of downstream sequence. This gene is named Lgals6 in accordance with the naming of other galectin genes (23). All of the partial galectin-6 cDNA sequence (3) was represented within Lgals6 and was identical to the determined gene sequence with the exception of three base changes in exon 4 (nt 384, 447, and 461 in the cDNA), which could be ascribed to the different strain sources of the RNA and genomic DNA.
This group of three exons (stippled in Fig. 4) encode the tightly folded canonical galectin CRDs as revealed in the crystal structures of galectins-1, -2, and -10 (2, 24 -26), with the middle exon of each group of three exons encoding all of the residues interacting directly with bound carbohydrate. The site of the boundaries of these exons within the Lgals6 sequence appear to be highly conserved with one exception (Fig. 5). Asterisks are placed over E box sequences (35), and plus signs demarcate the sequence that strongly resembles the intestinal-specific regulatory element of the apolipoprotein B gene (32). The dollar signs and pound signs indicate possible exon-intron and intron-exon boundary sites, respectively. Numbering is from the first nucleotide of the translational initiation site. Restriction sites are indicated below the pertinent sequences. Repetitive elements in introns 2, 5, 6, and 7 are designated by alternating underlines and overlines. The consensus polyadenylation signal is indicated by asterisks over the site.
CLC gene encode the first few amino acids that are disordered in the crystal structures of galectin-1, -2, and -10 (24 -26), and exons 2 and 3 in Lgals3 encode other domains in galectin-3 with no sequence similarity to the carbohydrate-binding domain.
The sequence encoded by exon 5 of galectin-6 forms most of the link region between the two CRDs; the rest of the link region is, as mentioned above, encoded by the last part of exon 4. Considering the high amount of sequence identity between galectin-4 and galectin-6 elsewhere (3), it is notable that galectin-6 has a link region that is 24 amino acids shorter. If this marked structural difference had arisen because of a mutation in sequences involved in splicing, then a mutated vestige of the "missing" 72 nt should be found within intron 4. However, the complete sequencing of intron 4 gave no evidence for such a sequence. Hence, either the galectin-6 gene underwent a deletion in its evolution or the galectin-4 gene had an insertion or duplication.
For another bi-CRD galectin, galectin-9, a variation of link region length appears instead to be caused by alternative splicing. In this case, alternative splicing was proposed to account for the insertion of 93 nucleotides coding for an additional 31 amino acids at the beginning of the link region (Ref. 8; see also To substantiate this matter, we sought further evidence based on the genomic sequence. Computer analysis of the entire Lgals6 sequence using the program FGENEH, 2 which tries to reconstruct coding sequence by searching for spliceable open reading frames and other criteria (27), predicted nt 1 as the translation start site. The few ATG codons in the preceding sequence are unlikely to act as translation start sites because they are followed by multiple in-frame stop codons. HSPL, another program available at the same web site 2 that is specifically designed to identify intron/ exon boundaries, also did not predict any splicing within the upstream region that would remove these stop codons.
Visual identification and confirmation by the TSSW program 2 located two possible promoters with TATA boxes at Ϫ475 and Ϫ79 nt. TSSW tries to predict promoters by weighing together the likelihood of a large number of transcription factor binding sites (28) using a modification of the method of Prestridge et al. (29). No other promoters were predicted within the entire Lgals6 sequence. The location of the suggested promoter at Ϫ79 nt is consistent with a transcription initiation site at about Ϫ50 nt and translation initiation site at ATG at nt 1-3. The location of the promoter at Ϫ475 nt predicts transcription initiation at about -450 nt but, as mentioned above, translation initiation at nt 1 is most likely the case here as well.
To identify the major transcription initiation site(s), we performed a primer extension experiment. With an antisense FIG. 4. Organization of Lgals6 compared with other galectin genes. The coding exon sequences are denoted by boxes, stippled for sequence that is part of the tightly folded carbohydrate-binding domain and open for other sequence. The exon number is given in or above each box, and the number of nucleotides in the coding sequence below each box. The first exon of mouse Lgals3 (not shown) does not code for any translated amino acids (12,13). References are as follows: human LGALS1 (10), murine Lgals1 (47), human LGALS2 (11), murine Lgals3 (12,13), murine Lgals6 (this paper), human CLC encoding galectin-10 (14), and chicken C-14 gene (9).

FIG. 5. Comparison of exon boundaries within the carbohydrate binding domains of several galectins.
The galectin-6 amino acid sequence has been aligned to the sequence of human galectins-1 and -2 (10, 11) and -10 (14), and mouse galectin-3 (12,13). Exon boundaries are indicated by vertical bars. Conserved residues interacting with bound carbohydrate (25) are indicated with asterisks under the sequences. primer (mG6Q , Table I) hybridizing with sequence between nt 62 and 32 downstream of the putative translational start codon, a 113-nt primer extension product was generated (Fig.  6), which would correspond to a transcription start site at nt Ϫ51. No longer products were detected. In control experiments, the size of the longest primer extension products using an actin-specific primer and a GAPDH-specific primer agreed with the reported transcriptional initiation sites (Fig. 6). Moreover, the predicted transcriptional start site for galectin-6 is 24 nt downstream of the TATA box at nt Ϫ79, and conforms well with the consensus transcriptional initiation site (30).
Some of the primer extension product in lane b of Fig. 6 may be due to galectin-4 because two recently reported mouse ga- The cDNAs produced were electrophoresed along with the molecular weight markers (HinfI-digested X174, labeled with 32 P; sizes indicated to the right). The arrowhead indicates the major galectin-6 cDNA formed (113 nt). A schematic is shown at the top. The distance between the 5Ј end of the antisense primers and the ATG is shown to the right, and the length of the 5Ј untranslated RNA is shown to the left as reported for ␤-actin (48) and GAPDH (49), and deduced here for galectin-6.

FIG. 7. Repeated sequences in intron 2 and intron 6.
For intron 2 the first copy is shown at the top, and for intron 6 a consensus is shown at the top. Below are shown the repeated sequence(s) with identical residues indicated by a dot, gaps by a dash, and indeterminate nucleotides by an X. For intron 6, the numbers along the left refer to repeat number (1-5 adjacent to exon 6 and n-3 to n adjacent to exon 7), and (//) indicates the part of the intron that was not sequenced. lectin-4 expressed sequence tags (GenBank™ accession numbers AA265412 and AA499921) suggest that mouse galectin-4 is almost identical to galectin-6 between about Ϫ50 nt and 62 nt. However, even if this were the case, the 113-nt product must also derive from galectin-6 since no other primer extension product was found. Moreover, the amount of galectin-4 and galectin-6 mRNAs are within the same magnitude (3) and therefore, both would be detected in this experiment.
In conclusion, the main transcription start site for galectin-6 mRNA in normal adult colon is probably at Ϫ51 nt. Since the distal putative promoter (at Ϫ475 nt) lies within a 29-bp direct repeat of the sequence of the confirmed proximal promoter, it is reasonable that it would be active as well, perhaps under other physiological conditions and other parts of the intestine.
The translational initiation site in the transcript from the proximal promoter is predicted by the rules of Kozak (31) to be the ATG at nt 1-3 since this is the first ATG and it is also in a favorable context. As with all other known galectins, we found no evidence for a signal sequence or transmembrane sequence in the galectin-6 gene. This indicates that galectin-6, like other galectins, is expressed mainly as a soluble cytosolic protein, but may be secreted by non-classical mechanisms (2).
Upstream Regulatory Elements-In the accompanying re-port, we provide extensive evidence that expression of galectin-6 is limited to the gastrointestinal tract. We therefore searched the upstream region for the presence of any regulatory elements that are involved in tissue-specific expression of other intestinally expressed genes. We found a sequence between bp Ϫ354 through bp Ϫ367 (indicated by ϩ signs in Fig.  3) that is 72% identical to part of a 19-bp sequence within the apolipoprotein B upstream region that has been implicated in intestine-specific expression of this protein (32). This element is a strongly positive inducer of expression together with other sequences, and can also by itself confer expression of a reporter gene in the intestinal cell line Caco-2, as well as in the hepatoma HepG2. Screening of the upstream region against a data base of mammalian transcription factor binding sites using MatInspector (33) 3 revealed a wide variety of well known possible regulatory elements. Notable among those are six E boxes (at bp Ϫ70, Ϫ295, Ϫ336, Ϫ382, Ϫ415, and Ϫ466, indicated by asterisks in Fig. 3). One resembled a MycMax binding site, whereas others resembled MyoD binding sites. Such E boxes have been implicated in the regulation of gene expression in proliferating and differentiating epithelial cells (see, e.g., Refs. 34 and 35), but also expression of other genes in other tissues. Although the upstream sequences of Lgals6 do not permit prediction of the regulation of galectin-6 expression without further experiments, these sequences are clearly different from upstream regions of the genes encoding galectin-1 and -2 (10,11) or galectin-3 (12,13).
In addition, it is clear that the regulatory elements governing the two promoters in Lgals6 differ, suggesting that they may respond to different environmental or developmental stimuli. It is noteworthy that the mouse Lgals3 gene encoding galectin-3 contains two promoters as well (12,13), generating two different mRNAs encoding the same protein (36) but under different regulation (37).
Untranslated 3Ј Sequence-The sequence 3Ј of the stop codon in Lgals6 is very similar to the 3Ј-untranslated sequence of rat galectin-4 (Ref. 16; see also Fig. 2 in the accompanying paper (3)) up to a consensus polyadenylation signal AATAAA 51 bp after the termination codon. Downstream of the polyadenylation signal there is a (GT) 26 dinucleotide repeat. Besides sometimes being useful as polymorphic markers, such GT repeats have been implicated in message processing (38). GT repeats also may form Z-DNA (39), which binds specific proteins (40) and may modify nucleosome structure (41), thereby affecting transcription.
Introns-When the Lgals6 sequence was plotted in a dot matrix plot against itself, 4 several repetitive sequences were revealed.
The last 100 bp of intron-2 consist of an almost perfect 50-bp tandem duplication (Figs. 3 and 7, top). The sequence of this repeat did not resemble any other known repeated sequence. It ends at the splice acceptor site and encodes an open reading frame, which, however, is out of frame with exon-3.
All the known sequence of intron-6, except for the first 3 nt and last 40 nt, consists of a 30-nt repeating sequence (Fig. 7,  bottom). This repeating sequence has not been reported before, but it resembles a mouse mini-satellite DNA (42).
Intron 7 contains seven repeats of the pentanucleotide AC-CTC. The ACCTC sequence occurs as six tandem repeats in the opposite orientation in intron 3 of the mouse NCAM gene (43), but the significance is unknown. The remainder of intron 7 3Ј of the pentanucleotide repeat also contains repetitive sequence consisting of about 80% C and 20% T on the sense strand. This region was remarkably refractory to sequencing by the standard protocols. We were able to read this sequence only when we used the protocol described by McCrea et al. (20), which employs a tailing chase to dilute prematurely terminated chains.
Two Distinct Genes Encoding Galectin-4 and Galectin-6 -Although galectin-4 and galectin-6 are very similar, the distribution of differences along the whole coding sequence suggests that they are encoded by separate genes rather than being alleles or products of alternative splicing. We confirmed this by isolating a fragment of the galectin-4 gene by PCR from the genomic DNA of the same homozygous mouse strain, 129/SV, from which we isolated the galectin-6 gene. The coding sequence of the galectin-4 gene fragment was identical to the overlapping parts of the galectin-4 cDNA clones (3), and showed the expected differences from galectin-6 coding sequence (Fig. 8). Surprisingly, some intronic sequence is also remarkably similar between the two genes, suggesting that Lgals6 and Lgals4 must have diverged relatively recently.
Further proof of the existence of two separate genes is provided by genomic Southern blots. When EcoRI-digested mouse DNA was hybridized with an upstream Lgals6-specific probe (the insert of pLgals6-1c, Fig. 1), one band was observed (Fig. 9,  lanes a-c), whereas with a rat galectin-4 cDNA probe that recognizes both genes, two bands were observed (lanes g-i).
Since there are no EcoRI sites within the Lgals6 gene (Fig. 2), the second cDNA-detected band must correspond to Lgals4. Again, for HindIII-digested mouse DNA, only one band is detected by the Lgals6-specific probe (Fig. 9, lanes d-f), whereas additional stronger bands are detected by the rat galectin-4 cDNA probe (lanes j-l). These data can only be explained by the presence of two genes that are highly homologous.
Chromosomal Localization of Genes Encoding Galectin-4 and Galectin-6 -The chromosomal location of Lgals6 was mapped by linkage analysis of RFLPs in an interspecific backcross between M. spretus and C57BL/6J (22). The Lgals6-specific upstream probe detects one unique band in EcoRI and HindIII digested DNA from either parent or F1 hybrids (Fig. 9). An RFLP found for the restriction enzyme MspI (not shown) was used for mapping. A Southern blot of MspI-digested DNA from 66 offspring of backcrosses of the F1 with the C57BL/6J parental produced a pattern that was most coincident with several markers on chromosome 7. The frequency of differences was used to calculate distances from Lgals6 to these markers (Fig. 10).
Since the galectin-4 probes we used also react with DNA encoding galectin-6, we achieved specific mapping of Lgals4 by analyzing a HindIII polymorphism that is detected with these probes but not with the Lgals6-specific probe (Fig. 9, lanes i-l), and therefore is uniquely associated with Lgals4. The Lgals4 mapped to the same region on chromosome 7 as Lgals6. Such close linkage was previously found for the human LGALS1 and LGALS2 genes (23) encoding galectin-1 and galectin-2, respectively, and certain C-type lectins (44). The mapped genes in this region on mouse chromosome 7 are syntenic with the q13.1-13.3 region of human chromosome 19, suggesting that the human homolog(s) are likely to be found there. Interestingly, the genes encoding galectin-7 (45) and galectin-10 (the Charcot-Leyden crystal protein) (46) also map to human chro-  Table II.