Structural Characterization of Native Mouse Zona Pellucida Proteins Using Mass Spectrometry*

The zona pellucida is an extracellular matrix consisting of three glycoproteins that surrounds mammalian eggs and mediates fertilization. The primary structures of mouse ZP1, ZP2, and ZP3 have been deduced from cDNA. Each has a predicted signal peptide and a transmembrane domain from which an ectodomain must be released. All three zona proteins undergo extensive co- and post-translational modifications important for secretion and assembly of the zona matrix. In this report, native zonae pellucidae were isolated and structural features of individual zona proteins within the mixture were determined by high resolution electrospray mass spectrometry. Complete coverage of the primary structure of native ZP3, 96% of ZP2, and 56% of ZP1, the least abundant zona protein, was obtained. Partial disulfide bond assignments were made for each zona protein, and the size of the processed, native protein was determined. The N termini of ZP1 and ZP3, but not ZP2, were blocked by cyclization of glutamine to pyroglutamate. The C termini of ZP1, ZP2, and ZP3 lie upstream of a dibasic motif, which is part of, but distinct from, a proprotein convertase cleavage site. The zona proteins are highly glycosylated and 4/4 potential N-linkage sites on ZP1, 6/6 on ZP2, and 5/6 on ZP3 are occupied. Potential O-linked carbohydrate sites are more ubiquitous, but less utilized.

The zona pellucida is an extracellular matrix surrounding mammalian eggs that functions in taxon-specific gamete binding, provides a post-fertilization block to polyspermy, and protects the developing pre-implantation embryo (1)(2)(3). The mouse zona pellucida (ZP) 1 is composed of three major glycoproteins (ZP1, ZP2, and ZP3) that are synthesized and secreted by oocytes during a 2-3 week growth period (4). The primary structures of ZP1 (623 amino acids), ZP2 (713 amino acids), and ZP3 (424 amino acids) have been deduced from cDNA (5)(6)(7). Each glycoprotein has a signal peptide directing it into a secre-tory pathway, a ϳ260 amino acid zona domain containing 8 conserved cysteine residues, and a transmembrane domain near the C terminus followed by a short cytoplasmic tail (8). The zona domain has been observed in multiple proteins (9) and has been implicated in the polymerization of extracellular matrices (10).
During oocyte growth, ZP1, ZP2, and ZP3 traffick through the growing oocyte, and their ectodomains are released from a transmembrane domain at the surface of the cell (11,12). A conserved hydrophobic patch upstream of the transmembrane domain is required for progression to the cell surface 2 and a consensus cleavage site (RX(K/R)R2) for the proprotein convertase furin is present upstream of the transmembrane domain. Although this site has been implicated in the release of the zona ectodomain (13)(14)(15), mutations (RNRR3 ANAA, or RNRR3 ANGE), do not prevent incorporation of reporter-ZP3 proteins into the zona pellucida in growing oocytes (12,16) or transgenic mice (12) and secretion of recombinant human ZP3 with a similar mutation (RNRR3 ANAA) is not prevented (17).
The three zona proteins are extensively co-and post-translationally modified and a detailed structural analysis of mouse zona pellucida glycans has been reported (18). These observations are of particular interest because of the proposal that sperm bind to ZP3 O-glycans linked to Ser 332 and Ser 334 , and the corollary that their removal by glycosidases released from egg cortical granules prevent sperm binding after fertilization (19). However, there has been controversy as to the nature of the glycans involved and the candidacy of individual terminal sugars as sperm receptors has not been supported by targeted null mutations in mice (8,18). Moreover, recent genetic studies suggest that sperm binding to the zona pellucida is predicated on the three-dimensional structure of the zona pellucida matrix rather than a specific carbohydrate side chain. Cleavage of ZP2 by a protease released during cortical granule exocytosis that occurs upon fertilization may be sufficient to modify the supramolecular structure of the zona matrix and render it nonpermissive to sperm binding (20).
Many of these controversies stem from the paucity of biological material that makes robust biochemical analysis difficult and has prompted reliance on recombinant zona proteins expressed in heterologous systems where processing and modifications may differ from those in mouse oocytes. This report takes advantage of microscale LC-MS to partially characterize mouse ZP1, ZP2, and ZP3 as a mixture in native zonae pellucidae. A hybrid QTOF instrument has the advantages of high mass accuracy, great sensitivity and resolution, and is well suited for detection of low levels of biological materials. Using these technologies we have determined both N and C termini, intramolecular disulfide linkages, and have identified N-and O-glycosylation sites on mouse ZP1, ZP2, and ZP3. * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
searches were performed to consider irregular cleavages and posttranslational modifications. In addition, manual data analysis in search of specific ions of interest was carried out. All MS/MS fragment ions were within 50 ppm of their theoretical values determined by the BioLynx Protein/Peptide Editor and most were within 10 ppm.
Gel Electrophoresis and Western blotting-Zona proteins were solubilized in 2ϫ denaturing and reducing Laemmli sample buffer (25) and separated by one-dimensional SDS-PAGE on a 4 -20% NOVEX Trisglycine gel at 120 V. The proteins were then electroblotted onto a NitroPure-supported nitrocellulose membrane (45-m pore diameter; OSMONICS INC., Westborough, MA) at 25 V for 1 h. Nonspecific binding was blocked by incubating the nitrocellulose in phosphatebuffered saline containing 0.1% Tween-20 and 10% nonfat dried milk for 1 h at room temperature. Proteins were immunoblotted overnight at 4°C in the same blocking solution containing one of the following rat monoclonal antibodies specific to: ZP1 (m1.4, 1:100 hybridoma supernatant) (26), ZP2 (IE3, 1:100 hybridoma supernatant) (27), and ZP3 (IE10, 1:1000 IgG fraction isolated from hybridoma supernatant) (28). The blots were washed three times (15 min each) with phosphatebuffered saline containing 0.1% Tween-20, and then incubated in an anti-rat secondary IgG-conjugated to horseradish peroxidase for 1 h at room temperature. Immunoblotted bands were washed again and then visualized by enhanced chemiluminescence (ECL) according to the manufacturer's instructions (Amersham Biosciences).

RESULTS
Preliminary Analysis of the Zona Pellucida-Mass spectrometric analyses were performed on native zonae pellucidae isolated from 500 NIH Swiss mice and purified by density gradient centrifugation. Monoclonal antibodies that recognize peptide epitopes detected mouse ZP1 (average molecular mass, 132 kDa), ZP2 (120 kDa) and ZP3 (79 kDa) on immunoblots after samples had been reduced and alkylated (data not shown). Following treatment with PNGase F to remove Nlinked glycans, there was a dramatic shift in the apparent molecular mass of ZP1 (132 kDa 3 105 kDa), ZP2 (120 kDa 3 68 kDa) and ZP3 (79 kDa 3 44 kDa), similar to those reported earlier for ZP2 and ZP3 (29). Additional treatment with a mixture of exo-and endo-O-glycosidase resulted in a less diffuse band for ZP1 and ZP3 and a further shift in average molecular masses to 63 and 39 kDa, respectively. However, there was no apparent shift in the molecular mass of ZP2, confirming previous observations (29). Although glycoproteins run anomalously on SDS-PAGE (30), these results suggest that ZP1 is more heavily O-than N-glycosylated, ZP2 is predominantly N-glycosylated with little or no O-glycosylation, and ZP3 is predominantly N-glycosylated with relatively little O-glycosylation.
Each sample analyzed by mass spectrometry was a mixture of zona proteins with ZP2 and ZP3 present in approximately equal amounts and ZP1 much less abundant (31). Using a combination of proteolytic enzymes before and after enzymatic deglycosylation, 56% of the polypeptide chain of mature ZP1 (see Supplemental Table IA), 96% of mature ZP2 (see Supplemental Table IB) and 100% of mature ZP3 (see Supplemental  Table IC) was identified by mass spectrometry. Although looked for, two or more ions ascribable to other known proteins were not observed in the zona preparation with the exception of clusterin/apolipoprotein J/sulfated glycoprotein 2 from Mus musculus (32). This protein, implicated in cell-cell adhesions of epithelia tissues including the early embryo, was identified by CID spectra of two peptides 385 VSTVTTHSSDSEVPSR 400 and 401 VTEVVVK 407 . Whether clusterin participates in the zona pellucida matrix or its presence reflects a minor contamination of the zona preparation remains to be determined.
Determination of the N Termini of ZP1, ZP2, and ZP3-Virtually all extracellular proteins have N-terminal signal peptides that direct them into secretory pathways and are removed in the endoplasmic reticulum by signal peptidases. A predictive algorithm (33) predicts cleavage of ZP1, ZP2 and ZP3 immedi-ately upstream of Gln 21 , Val 35 , and Gln 23 , respectively. Edman degradation sequence confirmed the N terminus of ZP2 (6), but was either imprecise for ZP1 (7) or uninformative for ZP3 (5).
Peptide mapping of ZP1 from Asp-N digestion followed by LC-MS indicated that the N terminus starts at Gln 21 , which had been converted to pyroglutamate. The CID spectrum (Fig.  1A) of the precursor ion at m/z 811.37 2ϩ (inset, calc. 811.39 2ϩ ) corresponding to the mass of the N-terminal peptide 21 qRLHLEPGFEYSY 33 (q ϭ pyroglutamate) indicated the presence of both y and b ion series including y 1-2 , y 2 -H 2 O, y 7 , b 2-6 , b 8 -10 , b 2 -NH 3 , b 4 -NH 3 , b 6 -NH 3 . In addition, an ion series a 5-6 , a 9 , and a 11 as well as immonium ions of tyrosine and phenylalanine were observed. MS data from the combined trypsin/Asp-N digestion revealed the presence of the [Mϩ2H] 2ϩ ion at m/z 915.45 (inset, calc. 915.46 2ϩ ) corresponding to the N-terminal carbamidomethylated peptide 35 VSLPQSENPAFPGTLIC 51 of ZP2 (Fig. 1B). The CID spectrum of this ion generated many internal fragment ions (PG, PQ, PGT, PGTLI, PQSENPAF, etc.) near proline residues and, together with sequence ions y 1 , y 2 , y 6 , and a 4 , b 2 -H 2 O, b 7 -NH 3 , b 11 , confirmed its identity.
For mouse ZP3, tryptic digestion revealed [Mϩ3H] 3ϩ and [Mϩ4H] 4ϩ at m/z 702.42 and 527.06 that match the N-terminal peptide 23 qTLWLLPGGTPTPVGSSSPVK 43 , again with a pyroglutamate in place of a glutamine (Fig. 1C). Unfortunately, the low abundance of these multiply charged ions prevented them from being selected for fragmentation (CID). Furthermore, the highly charged state of this peptide is unusual since there is only one basic lysine residue. However, gas phase basicity can promote proton trapping by proline, tryptophan, and glutamine (34,35) and may account for these observations.
Determination of the C Termini of ZP1, ZP2, ZP3-A potential proprotein convertase (furin) cleavage site (RX(R/K)R2) that lies 35-40 amino acids N-terminal of the transmembrane domain is conserved among the mouse zona proteins and has been implicated in the release of the mature zona ectodomain (13). Because trypsin cuts within the furin site and could have provided ambiguous results, samples were digested with Asp-N. MS data was obtained from both N-deglycosylated and N/O-deglycosylated zonae pellucidae. For mouse ZP1, we observed a peptide of MH ϩ 774.42 Da corresponding to the sequence of 540 DSGIARR 546 both as a ϩ1 (calc. 774.42 1ϩ ) and ϩ2 charged ion at m/z 387.72 ( Fig. 2A). This indicates that the C terminus of mouse ZP1 (Arg 546 ) lies two amino acids upstream of the furin cleavage site. Due to the low abundance of these ions, CID data were not obtained.
For ZP2, Asp-N digestion and LC-MS data revealed the presence of a precursor ion of MH ϩ 1649.76 representing the C-terminal peptide 619 DSPLCSVTCPASLRS 633 where Cys 623 and Cys 627 were both carbamidomethylated (calc. MH ϩ 1649.76). The CID spectrum of the ϩ2 charged ion of this peptide at m/z 825.38 confirmed the identity of the peptide through the b ion series of peptide fragments (b 2 , b 3 -H 2 O, b 4 , b 4 -H 2 O), as well as the y ion series (y 6 -y 12 , y 6 -NH 3 , y 9 -NH 3 , y 10 -H 2 O) (Fig. 2B). Hence, the C terminus of ZP2 (Ser 633 ) also lies two amino acids upstream of the furin cleavage site.
ZP3, in which there was no convenient aspartate residue, was digested with PNGase F, which released protein-bound N-glycans and converted Asn 330 to aspartic acid. Subsequent Asp-N digestion and LC-MS revealed the presence of the Cterminal peptide 330 DSSSSQFQIHGPRQWSKLVSRN 351 (Fig.  2C), and its identity was confirmed by CID (y 3 -y 6 , y 12 2ϩ , y 13 2ϩ , y 15 2ϩ , y 16 2ϩ as well as a 2 , b 2 , b 2 -H 2 O, b 3 -H 2 O, b 4 -H 2 O). Thus, the C terminus of ZP3 lies at Asn 351 . Taken together, these mass spectrometric data indicated that the primary cleavage site of native ZP1, ZP2, and ZP3 lies N-terminal to a dibasic motif that is part of, but distinct from, the proprotein convertase (furin) cleavage site.
Disulfide Linkage Mapping-Blocking with 4-VP at pH 7.2 revealed no S-pyridylethylated cysteine-containing peptides in the mixture, suggesting that all cysteines (at least those de- Asp-N cleavage specific at the N terminus of an aspartic acid residue followed by LC-MS analysis identified the C termini as the amino acid preceding a dibasic peptide motif upstream of the furin consensus cleavage site in all three cases. A, the C terminus of ZP1 as defined by the ϩ1 and ϩ2 charged ions at m/z 774. 42  tected in the digest) participate in disulfide bonding. In the following discussion, the two disulfide bonded peptide chains have been arbitrarily designated as P1 and P2, priming fragmentations that arise from the latter, e.g. yЈ. Because the disulfide bridge is sometimes "reductively" cleaved either between or on each side of sulfur, peptide fragment ions will appear carrying either an SH or SSH at the cysteine site, and these are referred to as y r (or yЈ r ) and y d (or yЈ d ), respectively. ZP1 forms a homodimer in the native zona pellucida. It has 21 cysteine residues and the potential to form 10 intramolecular disulfide bonds with the remaining cysteine residue available for intermolecular ZP1-ZP1 linkage. However, due to the low abundance of ZP1 in the zona protein mixture only one disulfide-bonded peptide was detected. The low abundances of the ϩ3 and ϩ4 charged ions at m/z 1351.05 and 1013.50 observed after trypsin digestion arose from 438 TDPSLVLLLHQC-WATPTTSPFEQPQWPILSDGCPFK 473 intramolecularly disulfide-bonded between Cys 449 and Cys 470 ( Fig. 3A and Table  I). No CID spectra were obtained, and as expected, both ions disappeared after treatment with tris(2-carboxyethyl)phosphine hydrochloride (TCEP) for 1 h. Unfortunately, the reduced ion 2 Da higher was not available to corroborate the reduction.
ZP2 has 20 cysteine residues capable of 10 disulfide bonds. Within the zona domain (containing ten cysteines, eight of which are conserved) four out of five possible disulfide bonds were identified (Table I). These linkages were confirmed by observing the disappearance of disulfide-bridged ions described below upon TCEP treatment and/or by sequence obtained from CID. Cys 365 / Cys 457 formed a disulfide pair as observed by ions at m/z 696.83 2ϩ (calc. 696.80 2ϩ ) and 464.88 3ϩ (calc. 464.87 3ϩ ) derived from the trypsin/Asp-N digest (data not shown). The calculated MH ϩ of the S-S linked peptides 362 DELCAQ 367 (P1) and 457 CYYIR 462 (P2) is 1392.59 Da, which is in good agreement with our experimental values. The CID spectrum of 464.88 3ϩ generated partial sequence ions of y 1-2 and b 2 from P1, as well as y 1 Ј and immonium ion of tyrosine residues from P2 (data not shown).
The Cys 396 /Cys 417 disulfide pair in ZP2 was observed by a very low abundance ϩ4 charged ion at m/z 836.41 (MH ϩ 3342.72). This ion derived from trypsin digestion corresponds to the peptides 382 PALNLDTLLVGNSSCQPIFK 401 (Asn to Asp conversion at position 393 after PNGase F treatment) joined with 410 FHIPLNGCGTR 420 via a S-S bond (combined masses of two peptides minus 2 Da). Although the CID spectrum of this ion was unavailable, 836.41 4ϩ disappeared after TCEP reduction. Furthermore, two ions showed up at m/z 1066.05 2ϩ and 607.81 2ϩ that correspond to 382 PALNLNTLLVGNSSC-QPIFK 401 (Asn 393 3 Asp 393 ) and 410 FHIPLNGCGTR 420 in their reduced state. This observation adds confidence in the assignment of this disulfide linkage even without CID data.
Two more disulfide links in ZP2 provided ϩ3 and ϩ4 charged ions at m/z 1198.59 and 899.19 (MH ϩ 3593.73), which correspond to the intramolecularly disulfide-bonded peptide 599 GLSSLIYFHCSALICNQVSLDSPLCSVTCPASLR 632 formed between the four cysteines within the same tryptic peptide (2 disulfide bonds with a loss of 4 Da). The CID spectrum of 1198.93 3ϩ did not generate many sequence ions as expected from its size and the two internal cystine linkages. Thus, the actual disulfide pairing among these four cysteines was indeterminate from trypsin digestion alone. However, this problem was resolved when additional Asp-N cleavage revealed the presence of the peptide 619 DSPLCSVTCPASLR 632 linked via Cys 623 /Cys 627 , as detected by ions at m/z 723.87 2ϩ and traces of 482.92 3ϩ . This linkage was corroborated by the disappearance of the ion at m/z 723.87 2ϩ after TCEP reduction, and the appearance of an ion at m/z 724.81 2ϩ corresponding to the above peptide with its free sulfhydryl groups. Thus, the second disulfide linkage must join Cys 608 and Cys 613 .
A disulfide bond between Cys 84 and Cys 102 , near the N terminus of ZP2, outside the zona domain was also identified. The mature mouse ZP3 amino acid sequence is essentially a compact zona domain. There are 12 cysteines in the mature form with four of them clustered near the C terminus outside the zona domain (Table I). In the first pair, masses corresponding to the peptide 44 VECLEAELVVTVSR 57 disulfide-linked to 133 VEVPIECR 140 (with loss of 2 Da) were observed at m/z 622.83 4ϩ and 830.12 3ϩ from both the trypsin only and Asp-N/ trypsin double digest (data not shown). These ions, however, were not selected for fragmentation by the software. After reduction, these ions vanished while ions at m/z 773.91 2ϩ and 472.75 2ϩ corresponding to both reduced peptides respectively were detected. In the second pair, a precursor ion of MH ϩ 3058.45 as detected by its ϩ3 and ϩ4 charged ions at m/z 1020.16 and 765.37 corresponds to 65 LVQPGDLTLGSEGC-QPR 81 (P1) disulfide-bridged to 91 FNAQLHECSSR 101 (P2). The CID spectrum of m/z 765.62 ϩ4 yielded a y ion series including y 1-3 ions prior to and y 5 r past Cys 78 of P1, as well as the b 2-4 ions (Fig. 3C). In addition, P2 generated (yЈ 1-3 ) ions, followed by y 7 Ј r past Cys 98 , and sequential b ions including (bЈ 2-5 ), b 7 Ј, and b 8 Ј r . This disulfide linkage was further confirmed by the results from Asp-N/trypsin double digestion. The ions at m/z 855.39 3ϩ and 641.78 4ϩ correspond to the mass of two peptides linked via a S-S bond (MH ϩ 2564. 16  N* represents an originally N-glycosylated asparagine residue converted to an aspartic acid upon PNGase F treatment.
Since six out of a total of eight cysteines in mZP3 had been accounted for in disulfide bonding, it seemed reasonable that the last linkage would be between 236 DFHGCLV 242 and 300 ACSF 303 . However, ions corresponding to this linkage calculated as MH ϩ 1214.50 Da were not detected in the double digest sample. A ϩ1 charged ion at m/z 427.16 corresponding to the mass of 300 ACSF 303 was detected only after TCEP reduction, but not present in the non-reduced digest. Similarly, the ϩ1 and ϩ2 charged ions at m/z 790.31 and 395.66, which correspond to 236 DFHGCLV 242 were only detected after TCEP treatment. This observation implies that Cys 301 was originally linked to Cys 240 .
Lastly, the C-terminal peptide 306 TSQSWLPVEGDADICD-CCSHGNCSNSSSSQFQIHGPR 342 with two asparagines at positions 327 and 330 converted to aspartic acids (see below) was internally disulfide-bonded twice, as demonstrated by the presence of a ϩ4 charged ion at m/z 988.41 (MH ϩ 3950.68). The conversion of two asparagines to aspartic acids resulted in a mass increase of 1.97 Da, however, the loss of 4.03 Da from formation of two disulfide bridges caused a net decrease of 2.06 Da (or 0.51 Da for a ϩ4 charged ion). The CID spectrum of 988.90 4ϩ identified the y ion series including y 1-7 , y 4 -NH 3 , y 7 -NH 3 , y 9 , y 12, together with b 6 -H 2 O and some internal fragments such as PV and HG (data not shown). The immonium ions of glutamine, histidine, and tryptophan residues at m/z 101.07, 110.07, and 159.09 were also observed, although direct evidence of the exact cystine bridging among this group of four cysteines could not be determined. The disappearance of 988.41 4ϩ together with the detection of 989.40 4ϩ (MH ϩ 3954.61) after TCEP reduction further confirmed these disulfide linkages.
N-linked Glycosylation Sites-N-glycosylation of proteins occurs only at asparagine residues within the consensus sequence NX(S/T) where X cannot be a proline. PNGase F endoglycosidase releases protein-bound N-linked glycans and by converting the involved asparagine residue to an aspartic acid provides a signature increase in mass (0.98 Da). There are four predicted N-linked glycosylation sites that follow the NX(S/T) sequence motif in native secreted ZP1 and six in both ZP2 and ZP3. In ZP1, all four predicted asparagines at positions 49, 68, 240, and 371 were N-glycosylated within the mature protein (Table II). Fig. 4A provides an example of the CID spectrum of a ϩ3 charged Cys-carbamidomethylated peptide 368 CIFNASD-FLPIQASIFSPQPPAPVTQSGPLR 398 at m/z 1119.53 (MH ϩ ϭ 3356.57) derived from trypsin digestion. The MH ϩ ion of this peptide is 0.87 Da higher than the expected value (MH ϩ ϭ 3355.70), suggesting that Asn 371 was converted to Asp. Fragmentation generated a series of b ions (b 2 -b 12 and b 14 -b 16 ), as well as y ion series including y 4 -y 5 , y 7 , y 9 -y 16 , y 13 -H 2 O, y 14 -NH 3 , y 9 2ϩ , y 12 2ϩ , y 14 2ϩ , y 15 2ϩ , y 16 2ϩ , y 22 2ϩ , y 23 2ϩ confirming the peptide sequence. The b 4 -9 , b [11][12] , and b 14 -16 ions clearly demonstrated a change of Asn to Asp at position 371 upon PNGase F treatment. In order to obtain more sequence information, additional proteolytic cleavages were subsequently carried out.
In ZP2, all six N-glycosylation sites were occupied (Table III). Trypsin digestion after PNGase F treatment clearly showed that four Asn residues at positions 83, 172, 184, and 393 were converted to Asp. In Fig. 4B [12][13] , and a 14 ions further confirm the sequence identity and demonstrates that Asn 70 preceding a proline was not N-glycosylated, as predicted. In addition, the ϩ2 charged ion of this peptide at m/z 1383.65 was observed (0.95 Da higher than the calculated mass) and its CID spectrum showed a very similar fragmentation pattern to that of the ϩ3 charged ion (data not shown).
As a result of Asp-N as well as trypsin/Asp-N sequential digestion, Asn 217 and Asn 264 were also identified as N-glycosylation sites in ZP2. In the case of Asn 264 , a peptide 264 NATH-MTLTIPEFPGK 278 resulting from the double digest was detected at m/z 829.41 (ϩ2 charged) and 553.27 (ϩ3 charged), a mass increase of 0.98 Da which was further confirmed by the CID spectrum of 829.41 2ϩ (data not shown). This conversion led to Asp-N cleavage at position 264 which allowed detection of this peptide. The same observation was made with a peptide 217 NATGIVHYVQESSYLYTVQLELLFSTTGQK 246 at m/z 1130.89 (ϩ3 charged) derived from the sequential digest that resulted from the Asn-Asp conversion at position 217 for Asp-N cleavage. In both cases, a mass increase of 0.93-0.98 Da was noted from the conversion.
Similarly, trypsin digestion of ZP3 generated five out of six Asp-containing peptides after PNGase F deglycosylation (Table  IV). A ϩ4 charged ion at m/z 1046. 43 indicates that the Cterminal peptide 306 TSQSWLPVEGDADICDCCSHGNCSNS-SSSQFQIHGPR 342 was N-glycosylated at both Asn 327 and Asn 330 . Interestingly the observation of a second co-eluting ion at m/z 1046.16 4ϩ implies the presence of another population of the same peptide N-glycosylated at either Asn 327 or Asn 330 (data not show). However, no CID data was available to locate the precise glycosylated site on the second ion. The very large tryptic peptide fragment from residue 185 to 256 encompassing the predicted N-glycosylation site Asn 227 was not detected. To obtain additional information on this middle region of the ZP3 sequence, Asp-N as well as trypsin/Asp-N sequential digestion was performed. Two Asp-N fragments, 214 DHCVATPSPLPD-PNSSPYHFIV 235 and 225 DPNSSPYHFIV 235 at m/z 817.37 3ϩ and 638.29 2ϩ , the masses of which match the calculated values of these peptides (MH ϩ 2450.14 and 1275.60) clearly indicated the absence of a predicted N-linked Asn 227 residue. Asn 304 was found to be N-glycosylated from the tryptic peptide 300 ACS-FNK 305 showing a mass shift of ϩ0.98 Da and confirmed by the CID spectrum of its ϩ2 charged ion at m/z 364.16 (data not shown). Further confirmation for this N-linked asparagine site came from Asp-N digestion where a peptide 295 DKLNKACSF 303 was observed at m/z 541.76 2ϩ due to the generation of a new cleavage site at position 304 after PNGase F deglycosylation.  Fig. 4C, low energy CID generated the sequential y ion series y 1 -y 10 and y 6 -NH 3 ions, as well as b 2-3 and b 2 -H 2 O ions. Hence, the N-terminal Asn 330 of this ZP3 peptide was unambiguously assigned as an N-glycosylation site. O-linked Glycosylation Sites-Although O-glycans attach to threonines and serines, there is no specific consensus sequence to readily predict potential linkage sites. Instead, monosaccharides must be removed by a series of exoglycosidases (sialidase A, ␤-(1-4)-galactosidase, ␤-N-acetylglucosaminidase) until only the Gal␤(1-3)GalNAc core remains attached to the serine/ threonine residues. This results in a mass increase of 365.13 Da/core glycan over the basic peptide. Further O-deglycosylation with endo-O-glycosidase removes the core sugar leaving serine and threonine residues unmodified. Shifts in mobility on SDS-PAGE after deglycosylation suggest that ZP1 contains considerably more O-linked carbohydrate side chains than ei-ther ZP2 or ZP3 (data not shown) (29), although estimates of glycosylation based on SDS-PAGE are inexact (30). However, due to its low abundance, no mass spectrometric data was obtained on ZP1 O-linked glycosylation. Based on the near complete coverage of ZP2 prior to enzymatic removal of Olinked carbohydrates (96%), there appears to be only one potential O-linkage site (Thr 455 ). The absence of a significant shift in apparent molecular mass in SDS-PAGE after enzymatic removal of O-linked glycans, suggests that few, if any, serine/threonine residues are occupied or are at low occupancy below our detection limit (data not shown) (29).
Two ZP3 domains were identified that contain one or more O-linked oligosaccharide side chains: one at the N terminus (residues 23-43 with 5 potential sites) and the other within the zona domain (residues 144 -168 with six potential sites). The concomitant identification of peptides from these domains prior to deglycosylation implies a mixture of ZP3 molecules, some with O-glycans and others without. Multiply charged ions (ϩ3 and ϩ4) at m/z 702. 42 5B). Moreover, these ions were no longer present upon deglycosylation with endoglycosidases, again supporting that this peptide was previously O-glycosylated (data not shown). Unfortunately, even with CID data, we could not determine the exact site of the sugar linkage among the three potential sites (Ser 148 , Ser 149 , Thr 155 ) due to the loss of the sugar moiety prior to the peptide backbone cleavage. However, Thr 155 is a pre-  dicted O-linked glycosylation site in mouse ZP3 with a probability of 98% (www.cbs.dtu.dk/services/NetOGlyc/). Similarly, a ϩ2 charged ion at m/z 608.27 eluting early in the chromatogram from the tryptic digest of the N/O-deglycosylated sample corresponds to the mass of the ZP3 peptide 161 ATVSSEEK 168 with the Gal␤(1-3)GalNAc core attached, presumably to either Thr 162 or one of the two serines at positions 164 and 165. The CID spectrum of this ion produced only MH ϩ -[Gal(␤1-3)GalNAc] at m/z 850.41, perhaps due to being subjected to CID late in peak elution when less precursor ion signal is available. However, its low mass carbohydrate marker ions including GalNAcϩH ϩ , (GalNAc-H 2 O)ϩH ϩ , (GalNAc-2H 2 O)ϩH ϩ , and (GalNAc-HAc)ϩH ϩ , at m/z 204.09, 186.08, 168.08, and 144.07 resembled that of the O-glycosylated peptide described above, indicating that this peptide is clearly O-glycosylated. The lack of peptide ions with sugar moieties attached made it impossible to assign the site of the O-glycan linkage, but based on the predictive algorithm, Thr 162 has a 70% probability of being glycosylated.
Earlier studies have described mouse ZP3 as the primary sperm receptor, an activity ascribed to O-glycans attached at Ser 332 and Ser 334 (37,38). However, the trypsin/Asp-N digest of the native ZP mixture generated the masses at m/z 723.34 2ϩ and 482.56 3ϩ as described above (Fig. 4C). These masses correspond to the peptide 330 DSSSSQFQIHGPR 342 , where Asp-N cleavage took place at Asn 330 due to the Asn-Asp conversion (i.e. a mass shift of ϩ0.95 Da). Since these masses match the calculated masses of this peptide with the replacement of Asn with Asp (MH ϩ 1445.68) without any prior O-deglycosylation treatment (N-deglycosylated sample), and since the peptide identity was confirmed by CID sequence data, it indicates that neither Ser 332 nor Ser 334 are O-glycosylated at a measurable level. Because glycosylation at these sites was inferred from previous mutational studies (37), we looked specifically for the masses corresponding to various combinations of glycosylation sites using extracted ion chromatograms in the N/O-deglycosylated samples, but did not find them. Thus, to the extent of our mass spectrometric detection (low femtomole levels), we did not observe glycosylation of any potential O-glycosylation sites except an N-terminal cluster (predicted to be Thr 32 , Thr 34 , Ser 39 ) and a second cluster in the zona domain (predicted to be Thr 155 , Thr 162 ).

DISCUSSION
The mammalian zona pellucida is a unique biological structure that surrounds growing oocytes, ovulated eggs, and the pre-implantation embryo (39). Although essential for in vivo fertilization and early development, its biochemical characterization has been impeded by the difficulty of purifying adequate quantities of native material. Earlier studies had determined the presence of three major glycoproteins (ZP1, ZP2, ZP3) and their primary structures have been deduced from cDNA (8). More recent genetic studies using null mutations and replacement with human homologues have provided insight into the molecular basis of sperm binding to the zona matrix (20,26). We now report the biochemical analysis of ZP1, ZP2, and ZP3 in native mouse zonae pellucidae without further purification of individual proteins. Taking advantage of highly accurate and sensitive mass spectrometry, structural features of individual mouse zona pellucida proteins including N and C termini, presence of intramolecular disulfide linkages and sites of N-and O-glycosylation have been determined.
Proteolytic Processing of Zona Pellucida Proteins-The three zona proteins are distinct from one another with ZP1 and ZP2 more evolutionarily conserved than ZP3 (40). However, as a cohort they share certain common features. Each has a signal peptide to direct it into a secretory pathway and each has an ectodomain that must be released from a transmembrane domain prior to incorporation into the extracellular zona matrix. The native N terminus of each zona protein was determined by mass spectrometry. Both ZP1 (Fig. 6) and ZP3 (Fig. 8) are blocked by a pyroglutamate (pyroGlu 21 and pyroGlu 23 , respectively) and the N-terminal Val 35 of ZP2 (Fig. 7) confirms an earlier determination by Edman degradation (6). Thus, the signal peptides of ZP1, ZP2, and ZP3 are 20, 34, and 22 amino acids long, respectively, and the experimentally determined cleavage sites correspond to those of von Heijne's predictive algorithm (33).
Once directed into the secretory pathway, the zona proteins remain associated with the endomembrane system until they are released at the surface of the oocytes. There has been controversy as to the cleavage site required for release of the ectodomain from the predicted transmembrane domain near the C terminus (12,14,15,17). The mass spectrometric data indicates that the C termini of ZP1 (Arg 546 ), ZP2 (Ser 633 ), and ZP3 (Asn 351 ) in native zonae pellucidae are N-terminal to a dibasic motif (ZP1, Arg 547 -Arg 548 ; ZP2, Lys 634 -Arg 635 ; ZP3, Arg 352 -Arg 353 ). These presumed cleavage sites are part of, but distinct from, a proprotein convertase (furin) site (13) that is imperfectly conserved among zona proteins. The ZP1, ZP2, and ZP3 dibasic motif lies 43, 50, 37 amino acids, respectively, upstream of the mouse protein transmembrane domains and is conserved in all mammalian species examined to date. It has been suggested that similarly positioned C termini in the quail and Xenopus homologues of ZP3 result from cleavage at the proprotein convertase followed by carboxypeptidase trimming of two basic residues (41,42). The observation that mutation of FIG. 6. Summary of mouse ZP1. The primary amino acid sequence (single letter code) of ZP1 obtained from the native mouse zona pellucida extends from an Nterminal pyroglutamine (p21) to a C-terminal arginine (R 546 ) immediately upstream of a dibasic cleavage site. There are 21 cysteine residues (yellow on blue background); 10 are in the zona domain (yellow background) of which eight are conserved (C272, C306, C325, C366, C449, C470, C522, C527). One disulfide bond was experimentally determined, C449/C470 (solid line). All four of the potential N-linked sites (white on green background) were glycosylated (N49, N68, N240, N371). Peptides representing ϳ44% of mature ZP1 were not identified (white on gray backgrounds) because of paucity of biological material. Within these sequences were multiple serine (S) or threonine (T) residues representing potential O-linked glycosylation sites. the dibasic motif does not preclude secretion and incorporation of mouse ZP3 into the zona pellucida suggests that alternative cleavage sites are available as has been reported for other secreted proteins (43,44).
Thus, after N-and C-terminal processing, the polypeptide chains of ZP1 (Fig. 6), ZP2 (Fig. 7), and ZP3 (Fig. 8) will have molecular masses of 58, 68, and 36 kDa, respectively. These predictions are in good agreement with the apparent molecular masses observed after N/O-deglycosylation of ZP1 (63 kDa), ZP2 (68 kDa) and ZP3 (39 kDa) in native zonae pellucidae by immunoblot (data not shown) and autoradiography (29). The minor discrepancies may reflect residual O-linked sugars predicted after enzymatic deglycosylation or aberrant migration and are well within estimation errors associated with SDS-PAGE.
Formation of Intramolecular Disulfide Bonds within the Zona Domain-Disulfide linkages are thought to be one of the major factors in stabilizing native conformations of secreted proteins (45,46). No free cysteine residues were detected in the native zona pellucida proteins and intermolecular disulfide bonds have been observed only in ZP1 (31). A ϳ260 amino acid zona domain with eight conserved cysteine residues is present in ZP1 (amino acids 288 -542), ZP2 (amino acids 363-630), and ZP3 (amino acids 45-308) (9). The mass spectrometric data is most complete for the mouse ZP3 zona domain in which four disulfide bonds are defined (Fig. 8). The two N-terminal bonds (Cys 46 /Cys 139 ; Cys 78 /Cys 98 ) form 1-4 and 2-3 linkages (loopwithin-loop) and the two C-terminal disulfide bonds (Cys 216 / Cys 283 ; Cys 240 /Cys 301 ) form 1-3 and 2-4 crossover linkages. The four additional cysteine residues in ZP3 (Cys 320 , Cys 322 , Cys 323 , Cys 328 ) lie C-terminal to the zona domain and form two disulfide bonds, the linkage of which is indeterminate due to their tight clustering within nine amino acid residues.
Although incompletely determined, the formation of disulfide bonds in the zona domain of ZP1 (Fig. 6) and ZP2 (Fig. 7) appear to differ from that of ZP3. The two, N-terminal bonds (Cys 365 /Cys 457 ; Cys 396 /Cys 417 ) in the ZP2 zona domain conform with the loop within a loop motif observed in ZP3, but the two disulfide bonds at the C terminus of the ZP2 zona domain (Cys 608 /Cys 613 ); Cys 623 /Cys 627 ) do not share the ZP3 crossover motif. Disulfide linkage between the remaining cysteine residues (Cys 538 , Cys 559 ) in ZP2 zona domain was not determined, but the corresponding residues (Cys 449 , Cys 470 ) in ZP1 form a disulfide bond. Thus, there appear to be two additional residues (beyond the 8 conserved cysteines) in the zona domain of ZP1 and ZP2 that are not present in ZP3 and disulfide bond formations in the C-terminal half of the ZP2 (and perhaps ZP1) zona domain differ from those of ZP3.
The zona domain has been implicated in forming protein polymers not only in the zona pellucida matrix, but between constituents of the extracellular tectorin membrane found in the inner ear (10,47). Genetically altered mice lacking ZP1 form a zona matrix composed of ZP2 and ZP3 (48); mice lacking ZP2 form a thinner, more fragile matrix composed of ZP1 and ZP3 (49); but mice lacking ZP3 do not form a zona pellucida (11,50). Thus, a zona matrix can be formed by either ZP1/ZP3 or ZP2/ZP3 consistent with the necessity of two types of zona domains: one from ZP3 and the other either from ZP1 or ZP2. Taken together these data suggest that the structure of ZP1 and ZP2 zona domains may be similar to each other and different from that of ZP3.
Glycosylation of Zona Proteins-N-glycosylation plays an essential role in the folding/trafficking of glycoproteins (51,52), and can only occur at asparagines that have a consensus FIG. 7. Summary of mouse ZP2. The primary amino acid sequence (single letter code) of ZP2 obtained from the native mouse zona pellucida extends from an Nterminal valine (V35) to a C-terminal serine (S633) immediately upstream of a dibasic cleavage site. There are 20 cysteine residues (yellow on blue background); 10 are in the zona domain (yellow background) of which eight are conserved (C365, C396, C417, C457, C538, C608, C613, C623). Four disulfide bonds were experimentally ascertained, C365/C457, C396/C417, C608/C613, C623/C627 (solid line). Among the 10 cysteine residues in the N terminus of ZP2, the disulfide linkage of one (C84/C102) was determined. Six of the six potential N-linked sites (white on green background) are glycosylated (N83, N172, N184, N217, N264, N393). Peptides representing ϳ4% of mature ZP2 were not identified (white on gray backgrounds). Within these sequences was a single potential O-linked glycosylation site (T455). NX(S/T) motif (where X cannot be a proline). O-glycosylation derivatizes the hydroxyl groups of threonine and serine residues and, although there is no particular sequence motif dictating whether glycosylation can take place, flanking amino acids are thought to exert an influence (53,54). Each of the proteolytically processed mouse zona proteins contains a limited number of potential N-linkage glycosylation sites (ZP1, 4 sites; ZP2, 6 sites; ZP3, 6 sites), but considerably more potential O-linkage sites (ZP1, 82 sites; ZP2, 84 sites; ZP3, 58 sites). Zona glycoproteins were either N-or N/O-deglycosylated as described above to identify glycosylated asparagine, serine, and threonine residues.
Deglycosylation with PNGase F releases the entire N-glycan bound to asparagine residues and by converting the residue to aspartic acid provides an unequivocal mass spectrometric signature of the glycosylation site. All four potential N-linked sites on ZP1 (Asn 49 , Asn 68 , Asn 240 , Asn 371 ) contain carbohydrate side chains (Fig. 6) and all six sites on ZP2 (Asn 83 , Asn 172 , Asn 184 , Asn 217 , Asn 264 , Asn 393 ) are also occupied (Fig. 7) in accord with early estimates (55,56). Five of the six potential N-linked sites on ZP3 (Asn 146 , Asn 273 , Asn 304 , Asn 327 , Asn 330 ) have carbohydrate side chains (Fig. 8), which is somewhat more extensive than earlier reports (57). Only Asn 227 on ZP3 was experimentally determined by mass spectrometry and CID not to be glycosylated, perhaps due to inaccessibility or the presence of proline residues immediately upstream and downstream of the consensus motif. Taken together, these data show that all but one asparagine residue within the NX(S/T) consensus motif is N-glycosylated in mature, native ZP1, ZP2, and ZP3. The molecular masses of N-glycans attached to the mouse zona pellucida ranges from 1.6 -3.8 kDa (18), and based on the number of side chains it appears that ϳ15-30% of the mass of individual mouse zona proteins is N-linked carbohydrate side chains.
The composition of O-glycans isolated from native mouse zona pellucida has been determined by chromatography and mass spectrometry (18). Although association with individual zona proteins was not reported, O-linked sugars ranged in size from three to six residues, did not include fucose, and the great majority had core-2 type structures, Gal(␤1-3)GalNAc, which provides a useful identification tag. We have reasoned that if a peptide is detected prior to deglycosylation or in an N-deglycosylated sample, then it is not O-glycosylated. Conversely, Oglycosylated peptides would only be found after removal of its O-glycans. Exo-O-glycosidases remove O-linked sugars from zona proteins leaving a Gal(␤1-3)GalNAc core attached to serine/threonine residues. Endo-O-glycosidase can be used in addition to exo-O-glycosidases to remove the core sugars with no modification of the serine/threonine residues. Thus, in addition to CID data detecting the attached sugar, the presence of the Gal(␤1-3)GalNAc tag (365.13 Da), on the serine/threonine residues before, but not after, treatment with Endo-O-glycosidase is useful in identifying O-glycan sites. However, in view of the fact that evidence has been found for loss of at least one type of O-linked sugar (mannose) upon collision in a triple stage quadrupole (58), one must consider the possibility that similar losses of the closely related O-linked GalNAc residue may arise from collisional processes in the source region.
Experimental determination by mass spectrometry of O-linked sites on ZP1 and ZP2 was not successful either due to incomplete coverage (ZP1) or a paucity of O-linked sugars (ZP2). Greater success was obtained with ZP3. Two clusters of O-linked glycosylation were detected on native ZP3 (Fig. 8). One, at the N terminus appears to contain three occupied amino acid residues (predicted to be Thr 32 , Thr 34 , Ser 39 ) and a second in the middle of the zona domain with two O-linkage sites (predicted to be Thr 155 , Thr 162 ). The identification of peptides from these regions prior to deglycosylation suggests that O-glycosylation in some cases is heterogeneous with some ZP3 molecules containing O-glycans and others not.
The biological functions of glycosylation in zona pellucida proteins remain to be determined. Treatment with tunicamycin which prevents the addition of N-linked sugars has been variously reported to inhibit or facilitate the secretion of ZP2 and ZP3 (59,60). More controversially, mouse ZP3 has been described as the primary receptor for sperm binding, a biologic activity ascribed to oligosaccharide side chains linked to Ser 332 and Ser 334 (19). However, neither serine is occupied by Olinked oligosaccharide side chains as evidenced by the presence of 330 DSSSSQFQIHGPR 342 under reducing and non-reducing FIG. 8. Summary of mouse ZP3. The primary amino acid sequence (single letter code) of ZP3 obtained from the native mouse zona pellucida extends from an N-terminal pyroglutamate (p23) to a C-terminal asparagine (N351) immediately upstream of a dibasic cleavage site. There are eight conserved cysteine (yellow on blue background) residues in the zona domain (yellow background) that are disulfide-linked, C46/C139, C78/C98, C216/C283, C240/C301 (solid line) as well four cysteines (C320, C322, C323, C328) that are C-terminal to the zona domain. The linkage of the latter (dotted lines) was indeterminate due to clustering of cysteine residues and the absence of appropriate cleavage sites. Five of the six potential N-linked sites (white on green background) are glycosylated (N146, N273, N304, N327, N330, but not N227) and there appear to be two clusters of O-linked glycans at the N terminus (predicted at T32, T34, S39) and within the zona domain (predicted at T155, T162). Clusters are indicated by bracket, potential sites by asterisks, and number of glycans by arabic numbers.
conditions (confirmed by MS and CID data), which was detected without any prior O-deglycosylation. Additionally, transgenic mice expressing mutant ZP3 (Ser 332 3 Gly 332 ; Ser 334 3 Ala 334 ) have normal fertility (61), although the more definitive assessment of their reproductive fitness in the Zp3null background has not been reported. Whether the N-terminal or zona domain cluster of O-glycans plays a role in sperm binding remains to be determined, but it seems unlikely that they act as the sole sperm receptor given the genetically altered mice in which sperm continue to bind to the zona pellucida despite the cortical granule reaction and the release of putative glycosidases (20).