Mouse Zp1 encodes a zona pellucida protein homologous to egg envelope proteins in mammals and fish.

Zp1 encodes one of the three major glycoproteins of the zona pellucida, an extracellular matrix that surrounds growing oocytes, ovulated eggs, and preimplantation embryos. The mouse gene is composed of 12 exons ranging in size from 82 to 364 base pairs and spans 6.5 kilobase pairs on chromosome 19 (2.13 ± 1.5 centimorgans distal to D19Bir1). The Zp1 exon map is similar to ZPB, a human orthologue, and an E-box (CANNTG), implicated in oocyte-specific gene expression of mouse Zp2 and Zp3, is similarly located upstream of the transcription start site. The single copy Zp1 gene encodes a 623-amino acid protein, the carboxyl-terminal half of which is significantly similar to a corresponding region of mouse ZP2. The conservation of this same region in a fish egg envelope protein suggests that not only has this protein domain been duplicated in mammals but that it has been conserved and used as an egg envelope protein in species that diverged 650 million years ago.

Zp1 encodes one of the three major glycoproteins of the zona pellucida, an extracellular matrix that surrounds growing oocytes, ovulated eggs, and preimplantation embryos. The mouse gene is composed of 12 exons ranging in size from 82 to 364 base pairs and spans 6.5 kilobase pairs on chromosome 19 (2.13 ؎ 1.5 centimorgans distal to D19Bir1). The Zp1 exon map is similar to ZPB, a human orthologue, and an E-box (CANNTG), implicated in oocyte-specific gene expression of mouse Zp2 and Zp3, is similarly located upstream of the transcription start site. The single copy Zp1 gene encodes a 623amino acid protein, the carboxyl-terminal half of which is significantly similar to a corresponding region of mouse ZP2. The conservation of this same region in a fish egg envelope protein suggests that not only has this protein domain been duplicated in mammals but that it has been conserved and used as an egg envelope protein in species that diverged 650 million years ago.
Among vertebrates, different reproductive strategies have evolved based on mating behavior, gamete structures, and the specificity of recognition molecules on the surface of sperm and eggs. In all vertebrates, however, a prerequisite to successful fertilization is penetration of sperm through an acellular envelope surrounding ovulated eggs. In mammals, capacitated sperm bind in a seemingly non-site-directed manner to the zona pellucida. Following the induction of the acrosome reaction and release of lytic enzymes, sperm penetrate the zona and fuse with the egg's plasma membrane, triggering the postfertilization block to polyspermy (1). In contrast, most fish sperm lack an acrosome and penetrate the vitelline envelope surrounding fish eggs via a discrete micropyle (2). Most commonly, the micropylar channel is sufficiently narrow to permit the passage of a single sperm, and subsequent fusion with the plasma membrane induces the cortical granule reaction, resulting in a block to polyspermy (3). It has become increasingly clear that the proteins of the zona pellucida are conserved among eutherian mammals and that the proteins of the vitelline envelope are conserved among teleostean fish. More recently, it has become apparent that, although critical for speciation, the proteins from the mammalian egg envelope are distinctly related to those of the teleostean envelope.
The mouse zona pellucida contains three major glycoproteins: ZP1, ZP2, ZP3. Genes encoding the latter two zona proteins have been characterized. Zp2 is composed of 18 exons (4), of which six encode a 241-amino acid domain reported as 28% identical with the wf& protein of the white flounder teleost (5). Zp3 contains eight exons (6,7), of which the first six encode a 261-amino acid domain that is 33% identical with ZI-3, a major component of the inner layer of the egg envelope of a second teleost, Oryzias latipes (8). Although similar structural domains are present in egg envelope proteins of teleosts and eutherian mammals, the site of synthesis is quite different in these two classes of vertebrates. In mice, the three zona genes (Zp1, Zp2, Zp3) are transcribed exclusively in growing oocytes (4,9,10), and the resultant zona proteins are secreted to form the extracellular matrix. In contrast, there is growing evidence that proteins of the two teleost egg envelopes are produced in the liver after stimulation with estrogens and then transported to the egg where they form the vitelline envelope (5,8).
We have previously reported the characterization of mouse Zp2 and Zp3 genes. We now report the characterization of mouse Zp1 and compare the encoded protein to other egg envelope proteins in mammals and fish.

MATERIALS AND METHODS
Screening a Mouse Genomic Library-1.8 ϫ 10 6 bacteriophage of a 129/Sv mouse genomic library (Stratagene) were screened by plaque hybridization (11) with a 32 P-labeled mouse ZP1 cDNA (10). Phage DNA was isolated and digested with NotI, and the insert was subcloned into SuperCos 1 cosmid (Stratagene). Cosmid DNA was amplified in XL1-Blue MR cells (Stratagene) grown overnight at 30°C (LB broth, 50 g/ml ampicillin) and purified with a Plasmid Maxi kit (Qiagen). The sequence of the genomic insert was determined by Me 2 SO-modified dideoxy chain termination (12) using ␣-35 S-dATP (Amersham Corp.) and the Sequenase sequencing kit (version 2.0; U. S. Biochemical Corp.). Both strands of the coding regions were sequenced. Sequence analysis was performed using the Genetic Computer Group (program manual for the Wisconsin Package, version 8,1994) and PCGene (IntelliGentics) computer software.
Intron-Exon Map Determination-The transcription start site was determined by S1 nuclease protection. A 5Ј genomic fragment, Ϫ197 to 94, was amplified by polymerase chain reaction using isolated Zp1 as a template, and synthetic oligonucleotides were derived from the genomic sequence. After subcloning the fragment into TA cloning vector (Invitrogen), the resultant plasmid (pMoZP1.5) was linearized with HindIII. A 32 P-end-labeled synthetic oligonucleotide primer complementary to map position 40 -62 of ZP1 cDNA was used in an asymmetric polymerase chain reaction to generate a single-stranded 5Ј genomic probe (291 nucleotides). 0.5 ng of this probe was hybridized to 30 g of mouse ovarian RNA or mouse liver RNA and digested with 300 units of S1 nuclease (13). The digestion products were analyzed on 6% polyacrylamide sequencing gels; sequencing reactions with the same primer were used as molecular weight markers. The remaining boundaries of exons were determined by comparison of the genomic sequence with that of ZP1 cDNA (10).
The sizes of introns were determined by DNA sequencing or polym-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EMBL Data Bank with accession number(s) U24227-U24230.
‡ Supported by post-doctoral fellowships from "Istituto Pasteur-Fondazione Cenci Bolognetti" and from "Stiftelsen Blanceflor Boncompagni-Ludovisi, nee Bildt". To whom correspondence should be ad- erase chain reaction in a Perkin Elmer GeneAmp PCR System 9600 using Zp1 exon-specific forward and reverse oligonucleotide primers. The reaction conditions were as described by the Taq polymerase protocol (Perkin Elmer): 25 cycles of 95°C for 15 s, 50°C for 45 s, and 72°C for 1.5 min. The first cycle was preceded by 5 min at 99°C, and the last cycle was followed by a 7-min extension at 72°C. The polymerase chain reaction products were analyzed by agarose gel electrophoresis.
Southern Blot Analysis-10 g of mouse 129Sv genomic DNA (Jackson Laboratory) and 0.05 g of the Zp1 cosmid were digested with restriction enzymes SacI, BamHI, EcoRV, NcoI, KpnI, and HindIII. The digested samples were separated on a 0.7% agarose gel and transferred to a Nytran membrane (Schleicher & Schuell). A ZP1 cDNA fragment (150 -1963 bp) 1 was 32 P-labeled by random priming (Boehringer Mannheim), according to manufacturer's instructions, and used to probe the Southern blot in aqueous hybridization solution (6ϫ SSC) at 65°C. Final washes were performed at 65°C with 0.1ϫ SSC, 0.1% SDS (11). Autoradiography for the cosmid and genomic DNA digests were 15 min and 7 days, respectively.
Genetic Mapping-An interspecific back-cross panel (BSS) purchased from The Jackson Laboratory (Bar Harbor, Maine) was used to map Zp1. The panel consists of genomic DNA from 94 animals obtained by crossing (C57BL/6JEi ϫ SPRET/Ei)F 1 females to SPRET/Ei males (14). The Southern blot analysis was performed on 4 g of DNA as described above.

Mouse Zp1
Gene-A single clone was isolated from a mouse 129/Sv genomic library by plaque hybridization. The 18.5-kb insert contained the 6.5-kb Zp1 transcription unit and approximately 5 and 7 kb of 5Ј-and 3Ј-flanking sequences, respectively. The start of transcription was defined by S1 nuclease protection using a single-stranded cDNA probe derived from genomic sequence spanning the putative 5Ј end of the mRNA (Ϫ197 to ϩ62 bp) (data not shown). The first exon is 229 bp long, and a 57-bp 5Ј-untranslated region precedes the first ATG that encodes the initiator methionine. The Zp1 locus contains 12 exons (Fig. 1), ranging in size from 82 to 364 bp, that are separated by 11 introns, 89 bp to 1.3 kb (Table I). Except for the 5Ј end of exon 1 and the 3Ј end of exon 12, the exon-intron splice sites conform to the GT-AG border element consensus sequence (15).
Overall, the exon maps of the mouse Zp1 and its human orthologue, ZPB, are remarkably conserved (Fig. 1). Although the human gene (16) is more spread out (encompassing 11 kb), the sizes of most exons are nearly identical. Only exons 3 and 12 are larger in the mouse, and it is mostly the additional sequence in the third exon that accounts for the larger mouse protein (623 amino acids) compared with the human protein (540 amino acids). Excluding the region unique to the mouse, 510 amino acids of the two proteins align; 53% of these residues are similar (42% identical) to the human ZPB.
Genetic Mapping of Zp1-A previously described collection of C57BL/6J ϫ Mus spretus interspecific back-cross progeny (14) was used to localize Zp1 in the mouse genome. The BSS map-ping panel has been typed for over 1650 loci that are distributed among all of the mouse autosomes as well as the X and Y chromosomes. 2 Preliminary Southern blot experiments to detect polymorphisms at the Zp1 locus determined that HindIII digestion of the DNA from the two parental strains resulted in a 7.7-and a 6.5-kb fragment in C57BL/6J and M. spretus, respectively ( Fig. 2A, lanes 1 and 2). DNA from the progeny of the interspecific back-cross in the BSS panel was analyzed for this restriction fragment length polymorphism by filter hybridization using the same 32 P-labeled ZP1 cDNA probe (six examples are shown in Fig. 2A, lanes 3-8). Haplotype analysis (Fig.  2B) of the 94 animals detected two animals that had a recombination event between the proximal D19Bir1 locus (2.13% recombination frequency) and 1 animal that had a recombination event between the more distal D19Bir3 locus (1.06% recombination frequency). Thus, the Zp1 locus maps to mouse chromosome 19, 2.1 Ϯ 1.5 cM distal to the D19Bir1 locus (Birkenmeier anonymous DNA fragment 1 on chromosome 19, see Ref. 14) and 1.06 Ϯ 1.06 cM proximal to the D19Bir3 locus (Birkenmeier anonymous DNA fragment 1 on chromosome 19) (Fig. 2C).
The single genetic locus and the simple digestion patterns seen in the preliminary studies designed to detect polymorphisms at the Zp1 locus suggested that Zp1, like Zp2 and Zp3, is a low copy number gene. Six restriction enzymes recognizing hexanucleotide sites (SacI, BamHI, EcoRV, NcoI, KpnI, and HindIII) were used to digest isogenic samples of genomic DNA, and the Zp1 fragment was subcloned into SuperCos I. After digestion, Southern blots were prepared and probed with 32 Plabeled ZP1 cDNA (Fig. 3). All of the restriction enzyme fragments detected in genomic DNA (three examples are shown in Fig. 3A) were present in the subcloned genomic fragment digests (Fig. 3B). These observations are consistent with there being only one copy of Zp1 in the mouse genome.
Conservation of the Mouse Zp1 Coding Region-The mouse Zp1 gene obtained from 129Sv mice encodes a polypeptide chain of 623 amino acids (Fig. 4A). There are five nucleotide differences from the cDNA sequence obtained from NIH Swiss mice (10). Two differences between the mouse strains are silent, the remaining differences result in three amino acid substitutions. Of these substitutions, two are conservative changes at amino acid residue 445 (Val 3 Leu) and 486 (Arg 3 Lys); the third substitution at residue 246 (Thr 3 Ala) is not. The effect of these polymorphisms is unclear, as both 129/Sv and NIH Swiss mice are fertile and appear to have normal zonae pellucidae. Similar polymorphisms have not been detected in different strains of mice at the Zp2 (4) or Zp3 loci (6,7,17). As noted above, the 623-amino acid mouse ZP1 protein is considerably  longer than orthologues found in other mammals (16,18,19), most of which are approximately 540 residues in length. The effect of this difference on the structure or function of the mouse zona remains to be determined. In addition to the conservation among mammals, the ZP1 protein also contains a domain that is present in other zona pellucida proteins of the same species as well as in other egg envelope proteins from very disparate species (Fig. 4B). Within 348 amino acids of ZP1 (residues 268 -623) that align with mouse ZP2 (residues 363-713), 47% of the amino acids are similar, 32% are identical. In Zp1, this region is encoded by exons 5-12 and, in Zp2, by exons 11-18. Although each gene is located on a different chromosome, it appears that the domain encoded by these eight exons comes from a common ancestral gene. It further appears that much of this ancestral gene was present 650 million years ago. A similar (albeit slightly smaller) domain is present in an egg vitelline envelope protein present in white flounder (5), where it is encoded by exons 2-7 of the wf& gene that correspond to exons 5-9 plus exon 11 of mouse Zp1 (Fig. 4B).
Conservation in the Zona Gene Promoters-Earlier analysis of mouse Zp2 and Zp3 promoters had identified a 12 bp DNA sequence (element IV) that was necessary and sufficient to promote high levels of expression of reporter gene constructs microinjected into growing oocytes. This element binds a putative transcription factor, ZAP-1 (Zona Activating Protein), that is relatively abundant in oocytes but not in eight other tissues, including granulosa cells and testis (20,21). Developmentally, ZAP-1 DNA binding activity is observed prior to birth, when ZP2 transcripts are first detected in primordial mouse oocytes (22).
To investigate the possibility that a common regulatory pathway controls the expression of the three zona genes, we have determined the DNA sequence of 250 bp of the mouse Zp1 promoter. A TATAA box was identified Ϫ30 bp upstream of the transcription start site, but no CAAT box was detected. Comparison of the Zp1 promoter sequence with a data base of the binding sites for known transcription factors identified a multitude of potential binding sites. However, none were also present at comparable positions in the Zp2 and Zp3 promoters except for a consensus E-box sequence (CANNTG) located at Ϫ218 bp from the transcription start site (Fig. 5). In Zp2 and Zp3, the E-box forms the core of the aforementioned element IV. Clustered 6-bp mutations in it inhibit reporter gene activity and prevent the formation of the ZAP-1 complex (20). The Zp1 E-box (CAgcTG) is located at virtually the identical position in the Zp2 promoter (Ϫ216 bp) and is located similarly to the critical E-box in the Zp3 promoter (Ϫ181 bp).

DISCUSSION
The zona pellucida is an extracellular matrix that surrounds growing oocytes, ovulated eggs, and preimplantation embryos. The zona is composed of three distinct glycoproteins, each of which is conserved among eutherian mammals (differences in nomenclature complicate correspondence). Several of the genes encoding the zona proteins have been characterized. The exonintron maps and coding sequence of mouse Zp2, human ZP2, and the pig homologue (4,21,23), and of mouse Zp3, human ZP3, and hamster ZP3 (6,7,24,25) are well conserved. A third human zona gene, ZPB, has recently been reported (16). Because it is distinct from human ZP2 and ZP3, we reasoned that it is the orthologue of mouse Zp1. The recent cloning of mouse ZP1 cDNA from an expression library (10) has confirmed this hypothesis.
A near full-length ZP1 cDNA was used as a probe to isolate Zp1 from a 129/Sv murine genomic library. Mouse Zp1 is composed of 12 exons that span 6.5 kb of DNA. The sizes of the exons are similar to those reported for human ZPB (16), except that exon 3 is considerably larger (364 versus 103 bp, Fig. 1). Alignment of the 623-amino acid polypeptide encoded by mouse Zp1 with the 540 amino acid human ZPB indicates that the additional 83 amino acids in the mouse protein are encoded by the elongated exon 3. Although less conserved than mouse and human ZP2 (61% identity) or ZP3 (67% identity) proteins (21,24), the 42% identity (53% similarity) of the amino acid sequence of mouse ZP1 and human ZPB proteins indicate homology. This conservation, coupled with the maintenance of 19 of 20 cysteine residues, suggests that the three-dimensional structure of the two proteins in their respective zona matrices may be conserved as well.
In addition to conservation among mammals within each class (e.g. ZP1, ZP2 or ZP3), there is evidence of common ancestry between classes. The mouse ZP1 protein contains a 348-amino acid domain that is 47% similar to mouse ZP2 and is encoded by eight exons in both mouse Zp1 and Zp2. A similar domain was first noted by comparing R55 (rabbit orthologue of mouse ZP1) to mouse ZP2, although the genetic locus of the rabbit gene was not reported (18). As is most common in cases of partial gene duplication and exon shuffling (26 -29), the 5Ј ends of this 348-amino acid domain encoded by mouse Zp1 and Zp2 are bounded by type 1 introns (i.e. the intron begins after the first residue of the codon) and the open reading frame is maintained with the nonconserved exons. The sequence conservation, coupled with the alignment of 10 cysteine residues, suggests that the structural aspects of this domain are similar in the two zona proteins. These data indicate that the eight exons come from a common ancestral gene that has been duplicated in mammals and reutilized by exon shuffling. Although related, each of the mouse zona genes has been mapped to a distinct chromosome. In this manuscript we locate Zp1 to the proximal portion of chromosome 19, 2.1 Ϯ 1.5 cM distal to D19Bir1 (an anonymous DNA fragment). We have previously located mouse Zp2 and mouse Zp3 on chromosome 7 (11.3 Ϯ cM distal to Tyr) and chromosome 5 (9.2 Ϯ 2.9 cM distal to Gus), respectively (30).
A slightly smaller portion of the ZP1/ZP2 domain encoded by Zp1 exons 5-9 plus exon 11 has been identified in the wf& gene of white flounder, a distantly related aquatic vertebrate (5). Although the wf& protein appears to be part of the fish egg envelope, the expression of the wf& gene is restricted to the liver where it is inducible with estrogens. Outside of this 275amino acid domain, the fish protein is quite dissimilar. It does not include a furin proteolytic site important in the processing of the zona proteins 3  A, primary structure of the 623-amino acid ZP1 protein shown on the first line was deduced from the coding regions of the Zp1 gene isolated from a 129Sv genomic library. The amino acid sequence, represented with singleletter code, is numbered on the right. For comparison, the amino acid sequence deduced from a cDNA clone derived from NIH Swiss mice (10) is shown on the second line. Dashes represent identities between the NIH Swiss and 129Sv mice; the three polymorphism at positions 246 (Thr 3 Ala), 445 (Val 3 Leu), and 486 (Arg 3 Lys) are indicated by A, L and K, respectively. B, schematic representation of the conserved protein domains between mouse ZP1 (dark rectangle) and ZP2 (light gray rectangle, lower) encoded by exons 5-12 and 11-18, respectively. The 348-amino acid sequence of mouse ZP1 is 47% similar (32% identical) to that of mouse ZP2. A smaller region of the ZP1 polypeptide (275 residues, encoded by exons 5-9 plus exon 11) is 52% similar (36% identical) with a region encoded by exons 2-7 of white flounder wf& (light gray rectangle, upper), an egg envelope protein (5). (encoded by the last exon of each mouse zona gene). A recent report has identified a second, different, protein domain present in mammal and fish egg envelope proteins (8). A 261-amino acid sequence present in mouse ZP3 (encoded by Zp3 exons 1-6) is 33% identical with LS-F, a precursor of ZI-3, an egg envelope glycoprotein of O. latipes (medaka). Like the white flounder protein (from which it is distinct), LS-F transcripts are uniquely present in the liver where they are inducible with estrogen. Thus, although domains of vitelline envelope and zona pellucida proteins have been conserved for at least 650 million years, the control mechanisms for their expression have not. It appears that in fish, the two major glycoproteins are synthesized in the liver and transported to the egg where they form the inner egg envelope. The three mammalian zona proteins (one of which, Zp1, is ancestrally related to a second, Zp2) are synthesized exclusively in the oocyte (10).
Conservation among the zona genes extends to their promoters and may account, in part, for their coordinate and oocytespecific expression. The sequence of the promoter region of mouse Zp1 was determined and compared with the promoters of mouse Zp2 and Zp3 (4,7). Approximately 200 nucleotides upstream of the transcription start site of each gene is a canonical E-box sequence (CANNTG) (20) that has been described as a binding site for a class of transcription factors known as basic helix-loop-helix proteins (31). These factors commonly bind as heterodimers; one subunit is a ubiquitously expressed protein (E2A, HEB, E2-2), and the other is a tissuespecific protein. Using reporter gene constructs microinjected into growing oocytes, we find that cluster mutations of the E-box in either the Zp2 or Zp3 promoter, dramatically reduce reporter gene expression. Using gel mobility shift assays with synthetic oligonucleotides (40 bp) centered on the CANNTG binding site of either the Zp2 or Zp3, we can detect ZAP-1 in oocytes but not granulosa cells (20). The appearance of the ZAP-1 complex in oocytes is coincident with the detection of ZP2 transcripts in the prenatal ovary (22). It will be of interest to determine if similar investigations detect functional ZAP-1 binding to the E-box in the mouse Zp1 promoter.