Structure of Saccharomyces cerevisiae alpha-agglutinin. Evidence for a yeast cell wall protein with multiple immunoglobulin-like domains with atypical disulfides.

a-Agglutinin of Saccharomyces cerevisiae is a cell wall-associated protein that mediates cell interaction in mating. Although the mature protein includes about 610 residues, the NH2-terminal half of the protein is sufficient for binding to its ligand a-agglutinin.a-Agglutinin, a fully active fragment of the protein, has been purified and analyzed. Circular dichroism spectroscopy, together with sequence alignments, suggest that a-agglutinin consists of three immunoglobulin variable-like domains: domain I, residues 20–104; domain II, residues 105–199; and domain III, residues 200–326. Peptide sequencing data established the arrangement of the disulfide bonds in a-agglutinin. Cys is disulfide-bonded to Cys, forming an interdomain bond between domains I and II. Cys is bonded to Cys, in an atypical intradomain disulfide bond between the A and F strands of domain III. Cys and Cys have free sulfhydryls. Sequencing also showed that at least two of three potential N-glycosylation sites with sequence Asn-Xaa-Thr are glycosylated. At least one of three Asn-Xaa-Ser sequences is not glycosylated. No residues NH2-terminal to Ser 282 were O-glycosylated, whereas Ser, and all hydroxy amino acid residues COOH-terminal to this position were modified. Therefore O-glycosylated Ser and Thr residues cluster in the COOH-terminal region of domain III, and the O-glycosylation continues into a Ser/Thr-rich sequence that extends from domain III to the COOH-terminal of the full-length protein.

␣-Agglutinin of Saccharomyces cerevisiae is a cell wall-associated protein that mediates cell interaction in mating. Although the mature protein includes about 610 residues, the NH 2 -terminal half of the protein is sufficient for binding to its ligand a-agglutinin. ␣-Agglutinin 20 -351 , a fully active fragment of the protein, has been purified and analyzed. Circular dichroism spectroscopy, together with sequence alignments, suggest that ␣-agglutinin 20 -351 consists of three immunoglobulin variable-like domains: domain I, residues 20 -104; domain II, residues 105-199; and domain III, residues 200 -326. Peptide sequencing data established the arrangement of the disulfide bonds in ␣-agglutinin 20 -351 . Cys 97 is disulfide-bonded to Cys 114 , forming an interdomain bond between domains I and II. Cys 202 is bonded to Cys 300 , in an atypical intradomain disulfide bond between the A and F strands of domain III. Cys 227 and Cys 256 have free sulfhydryls. Sequencing also showed that at least two of three potential N-glycosylation sites with sequence Asn-Xaa-Thr are glycosylated. At least one of three Asn-Xaa-Ser sequences is not glycosylated. No residues NH 2 -terminal to Ser 282 were O-glycosylated, whereas Ser 282 , and all hydroxy amino acid residues COOH-terminal to this position were modified. Therefore O-glycosylated Ser and Thr residues cluster in the COOH-terminal region of domain III, and the O-glycosylation continues into a Ser/Thr-rich sequence that extends from domain III to the COOH-terminal of the full-length protein.
Sexual agglutinins are expressed on the surface of haploid budding yeasts, including Saccharomyces cerevisiae (Lipke and Kurjan, 1992;Pierce and Ballou, 1983;Hagiya et al., 1977;Crandall et al., 1974;Crandall and Brock, 1968). During mating, the interaction of complementary agglutinins of each species mediates direct cell-cell contact to promote fusion of pairs of mating partners to form diploid zygotes. Mutants defective in these sexual agglutinins are mating-deficient in liquid medium .
S. cerevisiae ␣-agglutinin is a highly glycosylated cell wallanchored protein that is constitutively expressed on cells of the ␣ mating type and is induced to greater expression levels in response to the mating pheromone, a-factor (Terrance et al., 1987;Hauser and Tanner, 1989;Lipke et al., 1989). The open reading frame of the ␣-agglutinin gene, AG␣1, encodes a single polypeptide of 650 amino acids, including an NH 2 -terminal secretion signal (residues 1-19) and a COOH-terminal glycosylphosphatidylinositol (GPI) 1 addition signal that is involved in cell wall anchorage (residues 628 -650) (Kodukula et al., 1993;Wojciechowicz et al., 1993;Kapteyn et al., 1994;Lu et al., 1994Lu et al., , 1995Van Berkel et al., 1994). The NH 2 -terminal part of the mature protein (residues 20 -350) contains the binding region, which has been proposed to consist of three domains (Wojciechowicz et al., 1993). These features are summarized in Fig. 1.
Within the NH 2 -terminal half, a segment (amino acid residues 200 -326, designated domain III) shows significant similarity to variable domains of the immunoglobulin superfamily (IgV domains) based on the amino acid sequence and predicted ␤-sheet profile analysis (Wojciechowicz et al., 1993). A His residue essential for binding has been identified within this putative domain (Cappellaro et al. 1991), and other essential residues have been identified by site-specific mutagenesis. 2 We have proposed that domains I and II are also Ig-like, but evidence to support this contention has been lacking.
In Ig domains, post-translational modifications help determine tertiary structure (Dwek et al., 1993;Williams and Barclay, 1988). We have investigated the disulfide bonding pattern of the 6 Cys residues and the positions of the N-and Oglycosylations in the Ig-like region (Terrance et al., 1987;Hauser and Tanner, 1989). N-Linked glycans are not important for cell adhesion, because endo H treatment or synthesis in the presence of tunicamycin does not affect binding activity (Terrance et al., 1987). O-Linked glycans are also present and appear to account for a significant portion of the apparent size of ␣-agglutinin (Wojciechowicz et al., 1993;Lu et al., 1994).
We have now produced a 332-residue active fragment, ␣-agglutinin 20 -351 , in quantities sufficient to allow investigation of the secondary structure and determine the positions of post-translational modifications. The results, along with those of a modified sequence alignment procedure, result in a model for ␣-agglutinin.

EXPERIMENTAL PROCEDURES
Chemicals and Reagents-All chemicals were from Sigma, unless otherwise stated, and of appropriate purity. Nitrocellulose membranes were from Schleicher & Schuell. Reagents for gel electrophoresis were from Kodak Scientific Imaging Systems. Protein standards and Bio-Gel P-60 were purchased from Bio-Rad. Reagents for polymerase chain reactions were obtained from Perkin-Elmer, and restriction enzymes were from New England Biolabs or U. S. Biochemical Corp. Endoprotease Arg-C, sequencing grade Staphylococcus aureus V8, hydrophilic bead-bound trypsin, and endoprotease Asn-N were from Boehringer Mannheim. The cysteine-specific reagent P-2007 (N-(1-pyrenemethyl)iodoacetamide) and reducing reagent TCEP (tris-(2-carboxyethyl)phosphine hydrochloride) were from Molecular Probes. Immobilon-AV membranes were purchased from Millipore.
Construction of pPGK-AG␣1 351 -Two single-stranded oligonucleotides were synthesized to use as primers for the construction of pPGK-AG␣1 351 . AG␣5Ј-H3Ј, TTC GCC AAG CTT TTC AAA ATG TTC ACT  TTT CTC, and AG␣M-H3Ј, AAA TGG AAG CTT TGG ATT ACG CAC  TAG TGT TTA TAC TTG T, contain HindIII sites (underlined nucleotides) outside the open reading frame. The 3Ј end primer included a stop codon (nucleotides with double underline) corresponding to Tyr 352 in the deduced ␣-agglutinin protein sequence. The DNA fragment encoding ␣-agglutinin 20 -351 was amplified using the AG␣1-containing plasmid pH27  as template in a polymerase chain reaction. The polymerase chain reaction product contained the open reading frame of AG␣1 from nucleotides 1 to 1053 and included the sequence encoding the secretion signal. The purified polymerase chain reaction product was cloned into the HindIII site of the expression vector YEp-PGK. The orientation of the insert was checked by restriction mapping with EcoRI, HindIII, and BamHI, and the sequence of the insert in pPGK-AG␣1 351 was verified by DNA sequencing.
Overexpression and Purification of ␣-Agglutinin 20 -351 from Culture Supernatant-pPGK-AG␣1 351 , encoding ␣-agglutinin 20 -351 , was introduced into the ag␣1 mutant L␣21. Transformants were grown to stationary phase in 1-liter cultures of synthetic uracil-less medium overnight at room temperature. The cells were centrifuged, and the culture supernatant was concentrated 10-fold through a Millipore filtration apparatus equipped with a membrane having a 100-kDa molecular weight cutoff. Aliquots of concentrated supernatant (50 ml) were dialyzed par overnight against 4 liters of 10 mM sodium acetate buffer, pH 5.5, at 4°C. The dialyzed material was partially purified by chromatography on a DEAE-Sephadex column (120-ml bed volume) which was previously equilibrated with 10 mM sodium acetate, pH 5.5. The column was washed with the same buffer and eluted with 300 mM sodium chloride, 10 mM sodium acetate, pH 5.5, in 3-ml fractions. The ␣-agglutinin 20 -351 content of each eluted fraction was determined by assaying for agglutinin activity (Terrance and Lipke, 1981). Fractions containing activity were pooled for further purification.
The active material was dialyzed and lyophilized. The dry powder was resuspended in 10 mM potassium chloride, 10 mM sodium acetate, pH 5.5, 0.01% SDS, and 1 mM EDTA and incubated with 1:200 to 1:500 molar ratio of endo H for 4 -6 h at 25°C or overnight at 4°C. Under these conditions, there was no detectable proteolysis of the ␣-agglutinin 20 -351 . The de-N-glycosylated ␣-agglutinin 20 -351 was chromatographed on a Bio-Gel P-60 size exclusion column (60-ml bed vol-ume) which had been previously equilibrated with 30 mM sodium acetate buffer, pH 5.5.
Immunoblots-Rabbit polyclonal antisera against ␣-agglutinin were raised by injection of purified deglycosylated ␣-agglutinin 20 -351 . Immunoblots were performed as described previously (Harlow and Lane, 1988;Wojciechowicz and Lipke, 1989). Briefly, after overnight transfer of proteins from SDS gels to nitrocellulose membranes, the membranes were blocked with 3% gelatin in phosphate-buffered saline and then incubated in the same buffer with 1% gelatin, 0.1% Tween 20, and a 1:1000 dilution of antibody that had been adsorbed with heat-killed a cells. A second incubation followed with a 1:1000 dilution of peroxidaseconjugated goat anti-rabbit-IgG antibody (Sigma) in the same buffer. The blots were stained by peroxidase-mediated reaction of 4-chloro-1naphthol with hydrogen peroxide.
CD and Structure Analysis-Cuvettes of 0.1-cm path length were used for far-UV spectra, with typically five spectra being accumulated, averaged, and base line-corrected on an AVIV CD spectrometer model 60 DS (Lakewood, NJ) interfaced to an IBM personal computer. All spectra were acquired at 25°C. For conversion to mean residue ellipticity, a mean residue weight of 111.77 was used. The program PROSEC (Yang et al., 1986) was used to analyze secondary structure distribution from recorded CD data. Smoothing was by the Gram method.
Endoprotease Digestions-Proteolytic digestions were initially conducted on heat-denatured ␣-agglutinin 20 -351 in the presence of 10% acetonitrile for 18 h at 25°C, according to the manufacturer' suggested protocols. The ratio of protease to substrate was 1:25 for trypsin and S. aureus V8, 1:100 for endoprotease Arg-C, and 1:5 for endoprotease Asn-N, respectively. Some trypsin and S. aureus V8 digestions were performed on nondenatured ␣-agglutinin 20 -351 . Endoprotease Arg-C digestions were performed in 0.1 M NH 4 HCO 3 buffer, pH 7.8. S. aureus V8 digestions were performed in 0.1 M sodium phosphate buffer, pH 7.8. Under these conditions, S. aureus V8 cleaves at the COOH-terminal side of both Glu and Asp, with some cleavage at Asn and Gln residues (Drapeau, 1978). Trypsin digestions were performed in 0.1 M Tris buffer, pH 8.0. HPLC purified tryptic peptides were lyophilized to remove acetonitrile and trifluoroacetic acid, prior to digestion with endoprotease Asp-N digestion in 100 mM Tris buffer, pH 7.0.
HPLC-Peptide mixtures derived from digests and reduced digests were fractionated by reversed phase HPLC on an Applied Biosystems instrument using a fully end-capped Microbore Vydac C18 (3 cm ϫ 3 mm inner diameter, 5 m) with a Brownlee RP-300 guard column. Solvent A was 0.1% (v/v) aqueous trifluoroacetic acid and solvent B was 90% acetonitrile in water (v/v) containing 0.1% trifluoroacetic acid (v/v). The solvent elution rate was at 50 l/min. The column effluent was monitored by absorbance at 220 nm, and peptide peaks were collected manually. For most tryptic digestions, products were fractionated with a linear gradient from 0 to 60% solvent B in 180 min. For identification of cysteine-specific labeled tryptic peptides, the gradient was programmed linearly from 0% solvent B to 100% solvent B in 200 min. For S. aureus V8 digestion, all peptides were fractionated by a linear solvent gradient from 0 to 45% solvent B in 180 min.
Cysteine-specific Labeling of Tryptic Peptides of ␣-Agglutinin 20 -351 -The cysteine-specific reagent, P-2007, was used to label free sulfhydryl groups. The reaction was performed overnight at room temperature in 80% dimethyl formamide in 0.1 M phosphate buffer, pH 7.2. In some cases, the digestion mixtures were treated with the reducing reagent TCEP, 7 mM, before labeling. Because TCEP does not react with P-2007, a simplified procedure was used, and the alkylation was carried out in the presence of the reducing reagent. Additives and trypsin beads were removed on a Bio-Gel P-2 spin column after labeling. The labeled mixture was separated on a microbore C-18 HPLC column, and P-2007labeled peptides were detected at 341 nm.
NH 2 -terminal Sequencing of Peptides-Peptides were sequenced by automated Edman degradation in a gas-phase sequenator (model 470A, Applied Biosystems Inc.). The resulting phenylthiohydantion-derivatized amino acid residues were separated on a Vydac C18 column using a 120A phenylthiohydantion analyzer (Applied Biosystems Inc.). Individual amino acid residues were identified and quantitated by comparison with standards. Peptides resolved and sequenced are summarized in Fig. 8.
Dot Blot Analysis of O-Linked Proteolytic ␣-Agglutinin 20 -351 Peptides-Each fraction from the reversed phase column was vacuumevaporated to dryness and resuspended in 30 l of 0.5 M sodium phosphate buffer, pH 8.0, containing 0.1% SDS (w/v). Peptide samples (3 ϫ 1 l) were spotted onto an Immobilon-AV membrane. The membrane was air-dried and incubated for 30 min in 10 mM Tris-HCl buffer, pH 7.5, containing 0.15 M sodium chloride and 0.1% Tween 20 (TTBS) and then blocked in a 3-h incubation in with fresh 10% ethanolamine (v/v) in 1 M sodium bicarbonate buffer, pH 9.5. After blocking, the membranes were incubated for 1 h with 0.5 g/ml concanavalin A (ConA)conjugated peroxidase in TTBS. After washing three times in 10 mM Tris-HCl buffer, pH 7.5, containing 0.15 M NaCl, the membranes were stained with 4-chloro-1-naphthol and hydrogen peroxide (Canas et al., 1993).
Other Methods-SDS-polyacrylamide gel electrophoresis was carried out according to the method of Laemmli (1970), using 12 and 15% gels. Proteins were visualized by staining with Coomassie Blue or a Silver Staining Plus kit (Bio-Rad). Protein concentrations were determined by bicinchoninic protein assay method (Pierce) using bovine serum albumin, fraction V as standard.

RESULTS
Expression and Secretion of ␣-Agglutinin 20 -351 -The plasmid pPGK-AG␣1 351 encodes a 351-residue form of ␣-agglutinin that lacks the COOH-terminal sequences which anchor ␣-agglutinin to the cell wall (Wojciechowicz et al., 1993), and therefore the product, ␣-agglutinin 20 -351 , is secreted into the culture medium after cleavage of the 19-residue secretion signal. An ag␣1 mutant harboring this plasmid secreted 4.5 ϫ 10 4 units (about 1 mg) of ␣-agglutinin/liter (Terrance et al., 1987;Wojciechowicz et al., 1993). ␣-Agglutinin 20 -351 in crude culture supernatants was identified by immunoblots before and after endoglycosidase H treatment (Fig. 2). The fully glycosylated protein had an apparent molecular size of 110 kDa. After removal of N-linked carbohydrates with endo H, the molecular size of ␣-agglutinin 20 -351 was reduced to 45 kDa. In some preparations, the protein was present as a doublet (Fig. 3), due to incomplete removal of N-linked glycan at one site (data not shown). The mobility of the deglycosylated ␣-agglutinin 20 -351 decreased after treatment with DTT ( Fig. 2). This decrease implies an increase in the Stokes radius caused by reduction of disulfide bonds.
Elution of endo H-treated ␣-agglutinin 20 -351 from a Bio-Gel P-60 column gave purified ␣-agglutinin with an apparent molecular size of 45 kDa for the smaller species on SDS gels (Fig.  3). The deduced M r of ␣-agglutinin 20 -351 from the predicted amino acid sequence is 37,108.
Therefore, N-linked carbohydrate accounts for two-thirds of the apparent 110-kDa molecular mass of ␣-agglutinin 20 -351 , and the O-linked carbohydrate remaining after endo H digestion could account for an additional 8 kDa of apparent mass.
The NH 2 -terminal sequence of each fragment was deter-mined by microsequence analysis after electroblotting onto polyvinylidene difluoride membranes. Both the 16-and 21-kDa fragments had the same NH 2 -terminal sequence as mature ␣-agglutinin, beginning at Ile 20 , immediately following the secretion signal sequence (Table I) (17 ml) was lyophilized to dryness, resuspended in 200 l of distilled water, and passed through a Bio-Gel P-10 column preequilibrated with 0.01 M sodium acetate, pH 5.5. Desalted material (20 l) was treated without or with endo H (0.5 l of 1 unit/ml) at room temperature for 2 h. Samples without and with Endo H treatment were analyzed by electrophoresis in the absence or presence of the reducing reagent DTT as indicated.

FIG. 3. Bio-Gel P-60 chromatography of endo H-treated
␣-agglutinin 20 -351 . The active material from DEAE-Sephadex A-25 was lyophilized to dryness. The material was resuspended and dialyzed against 0.03 M sodium acetate, pH 5.5, treated with endo H (15 l of 1 unit/ml endo H to 2000 units of ␣-agglutinin activity) and loaded onto a Bio-Gel P-60 column preequilibrated with the same buffer. Fractions (3 ml) were collected and monitored at 280 nm (A). Aliquots of fractions were electrophoresed on a 12% SDS-PAGE gel and visualized by staining with Coomassie Blue (B). Molecular size markers are shown on the left.

FIG. 4. SDS-PAGE analysis of endoprotease Arg-C-digested
␣-agglutinin 20 -351 . Samples of endoprotease Arg-C-digested ␣-agglutinin 20 -351 (left lanes) and endoprotease alone (center lanes, labeled "enzyme") were treated with or without DTT as marked, electrophoresed on a 15% SDS-polyacrylamide gel, and the gel was stained with Coomassie Blue. Molecular size standards on the right were from 97,400 to 4000 Da. this peptide is 21,989 Da. The extra 7 kDa of apparent molecular mass in agglutinin 155-351 may be attributed to the presence of multiple O-glycosylations (see below). No additional fragments were seen, including any of the predicted peptides following Arg residues (Fig. 4). Therefore, endoprotease Arg-C cleaved only at Lys 154 , instead of any of the six Arg residues in ␣-agglutinin 20 -351 .
Endoprotease Arg-C from Clostridium histolyticum also cleaved at Lys 154 only (data not shown). Peptide sequencing confirmed that the cleaved residue was Lys. No fragments were generated in ␣-agglutinin 20 -351 incubated without protease. Therefore, hydrolysis of ␣-agglutinin 20 -351 at Lys 154 was endoprotease Arg-C specific and not due to proteolytic activity in the ␣-agglutinin preparations or in other reagents used for the digestion. Tosyl-lysyl chloroketone inhibits Arg-C (Mazzoni et al., 1991); therefore, Arg-C must have proteolytic activity toward Lys.
Agglutination Activity of Proteolytic Fragments of ␣-Agglutinin 20 -351 -To examine whether any of the endoprotease Arg-C digested fragments retained agglutination activity, proteasetreated ␣-agglutinin 20 -351 was reconstituted with sodium acetate buffer to pH 5.5 and assayed for activity. This material had no measurable agglutination activity at concentrations up to 6.7 g/ml, whereas native ␣-agglutinin 20 -351 was active at 3.3 ng/ml. Therefore the agglutination activity was less than 2 ϫ 10 Ϫ4 that of intact ␣-agglutinin 20 -351 . Similarly, the 31-kDa ␣-agglutinin 155-351 fragment purified on a Bio-Gel P-30 column had less than 10 Ϫ4 of the binding activity of ␣-agglutinin 20 -351 (data not shown).
CD of Native ␣-Agglutinin 20 -351 -␣-Agglutinin 20 -351 has been proposed to consist of three Ig-like domains, which would consist of predominantly antiparallel ␤-sheets along with associated turns and loops, but little or no ␣-helix content (Williams and Barclay, 1988;Wojciechowicz et al., 1993). The CD spectrum of ␣-agglutinin 20 -351 (Fig. 5) showed a typical ␤-sheet structure profile, with a negative band at 217 nm (Brahms and Brahms, 1980). The absence of the intense negative peaks at either 208 or 222 nm, which are the characteristic of ␣-helix, indicated very little ␣-helix content in ␣-agglutinin 20 -351 . Quantitative analysis of the CD spectrum of ␣-agglutinin 20 -351 indicated the presence of 6.8% ␣-helix, 69.4% ␤-sheet, 13.2% turns, and 10.5% random structure. This high ␤-sheet content suggests the presence of antiparallel ␤-sheet structures, consistent with Ig domains.
CD of ␣-Agglutinin 20 -351 Digested with Endoprotease Arg-C-Because ␣-agglutinin 20 -351 is inactivated by endoprotease Arg-C cleavage at Lys 154 , the effect of the digestion on the structure of ␣-agglutinin 20 -351 fragments was examined. The digestion product showed substantial reduction in ␤-sheet content when spectra were taken at pH 7.8 (Fig. 5). However, after reconstitution at pH 5.5 for 30 min, the CD spectrum of the digest was very similar to that of native ␣-agglutinin 20 -351 , in both the negative peak position at 217 nm and the corresponding peak width (data not shown). Quantitative analysis of the CD spectrum revealed that the secondary structural profile was similar to native ␣-agglutinin 20 -351 , with 68.8% ␤-sheet, and a slightly higher aperiodic structure content. This CD profile indicated that the single site digestion at Lys 154 of ␣-agglutinin 20 -351 did not substantially alter the secondary structure of the protein fragments. Therefore, the inactivation of the binding activity is not due to gross structural change during the Arg-C digestion.
Disulfides in Endoprotease Arg-C-digested ␣-Agglutinin 20 -351 -The products of endoprotease Arg-C cleavage of ␣-agglutinin 20 -351 were separable in the absence of reducing agents (Fig. 4), showing that there is no disulfide linkage between them. Both the 21-and the 31-kDa fragments showed lower mobility on SDS-PAGE after DTT treatment, suggesting that each fragment contained one or more internal disulfide bonds. Based on the deduced amino acid sequence, the 21-kDa fragment contained Cys 97 and Cys 114 , implying that these two residues form a disulfide bond. The 31-kDa fragment, ␣-agglutinin 155-351 , contained four Cys residues (Cys 202 , Cys 227 , Cys 256 , and Cys 300 ). Therefore, the disulfide bonds in this fragment could not be determined from the endoprotease Arg-C data.
Identification of Disulfide Bonds-Identification of these disulfide bonds was accomplished by sequencing of tryptic and S. aureus V8 peptides that had different HPLC retention times in the presence and absence of DTT. Free sulfhydryls were identified in peptides that were not affected by DTT and confirmed by labeling with the iodoacetamide derivative P-2007.
␣-Agglutinin 20 -351 was digested with trypsin in the presence or absence of DTT, and the products were separated by reversed phase chromatography on a C18 column. Three tryptic peptides (T1, T2, T2Ј) were unique to the nonreduced chromatogram (Fig. 6A), and three peptides (DT1, DT2, and DT3) were unique to the reduced chromatogram (Fig. 6B). These peptides were sequenced and compared with the sequences of the Cyscontaining tryptic fragments predicted from the gene sequence (Tables II-IV). Peaks T1 and DT1 had the sequence of the predicted peptide containing both Cys 97 and Cys 114 . As with  the change in gel mobility, the change in retention time in the presence of DTT implied that these two Cys residues formed an internal disulfide. Similar chromatography and sequencing analyses of peptides from S. aureus V8 digests confirmed this assignment (Tables III and IV): peptide DS2 was seen only after reduction and contained Cys 97 . As expected, tryptic peptide T1 containing Cys 97 and Cys 114 was labeled with P-2007 after reduction, but was not labeled in nonreduced samples (Fig. 7, A and B).
Tryptic peaks T2 and T2Ј each yielded two sequences in approximately equimolar amounts (Tables II and IV). These sequences were those expected for disulfide-linked peptides containing Cys 202 and Cys 300 . Note that the peptides containing Cys 202 do not contain Cys 227 , because Lys 213 is efficiently cleaved (Fig. 6; Table IV). The difference in retention times of T2 and T2Ј must be due to differential modification of the fragments; differences in the extent of glycosylation of the peptide fragment containing Cys 300 would yield this result. In the chromatogram of tryptic peptides from reduced ␣-agglutinin 20 -351 , peaks T2 and T2Ј were absent, and new peaks appeared with retention times of 117 and 154 min (labeled DT2 and DT3 in Fig. 6B). Sequencing showed that these peaks were peptides predicted to include Cys 202 and Cys 300 , respectively. These results show that Cys 202 and Cys 300 are disulfide bonded. Sequencing of S. aureus V8-digested peptides (Tables III and IV) and P-2007 labeling (Fig. 7) also confirmed this result.
Cys 227 and Cys 256 Have Free Sulfhydryls-Tryptic peptide peak T4 from nonreduced ␣-agglutinin 20 -351 and peak DT4 from reduced ␣-agglutinin 20 -351 had a retention time of 155 min (Fig. 6) and yielded the same sequence containing Cys 227 (Tables IV and V). There is no tryptic site between Cys 227 and Cys 256 (Table III); therefore this peptide should contain both cysteines. This peptide does not appear to include a disulfide bond, because the retention time was not altered by reduction. In support of the presence of free sulfhydryls in this region, a peptide, S4, including a single Cys residue (Cys 256 ) was obtained and sequenced from S. aureus V8 digestion under both nonreduced and reduced conditions (Table V). Therefore, Cys 256 has a free sulfhydryl.
To verify that peptide peak T4 in the nonreduced profile contained Cys 227 and Cys 256 as free sulfhydryls, this peptide was labeled with P-2007. This peptide alone was labeled in reactions of tryptic digests with P-2007 under nonreducing conditions (Fig. 7, A versus B). To determine if the peptide contained two labeled cysteines, the isolated labeled peptide (Fig. 7C) was further digested with endoprotease Asp-N and rechromatographed (Fig. 7D). Two additional labeled peptides were detected at 35 and 45 min, as a result of the digestion. These peptides had the retention times expected for the labeled peptides containing Cys 256 and Cys 227 , respectively. The original labeled peptide with a retention time of 53 min, however, was still present, probably due to incomplete digestion. Therefore, both Cys 227 and Cys 256 are free cysteines.

Identification of O-Linked Glycosylation Sites by Peptide Sequencing-
We have sequenced all recovered tryptic and S. aureus V8 peptides from ␣-agglutinin 20 -351 , resulting in a peptide sequence that is about 76% complete, and including three of six potential N-glycosylation sequences and 52 of 74 Ser and Thr residues (Fig. 8). Glycosylated Ser or Thr residues are not detected by the sequencer; therefore, peptide sequencing provides an indirect method to identify O-linked glycosylation sites. Absence of a signal for Thr and Ser was interpreted to indicate glycosylation when the expected residues were observed at levels of 20 pmol or greater in the cycles immediately preceding missing Ser or Thr residues. Table VI summarizes the results from sequencing of S. aureus V8 and tryptic ␣-agglutinin 20 -351 peptides from two or more independent peptide sequences. A total of four S. aureus V8 peptides and two tryptic peptides contained modified Ser and Thr residues.
Confirmation of O-Glycans with ConA-O-Linked carbohydrates in yeast interact with ConA, because they consist of one to five ␣-linked mannose residues (Klis, 1994). To examine whether O-linked glycosylations were responsible for the masking of the undetected Ser and Thr residues, peroxidase-conjugated ConA was used to probe peptides from the nonreduced tryptic digest. Dot blot analysis of tryptic fractions of HPLC fractions of nonreduced digest showed that five peptides reacted positively with ConA (data not shown). These peptides (fractions 4, 5, 24, 25, and 26 of Fig. 6A) correlated with fragments containing modified Ser and Thr residues (Table VI). Because the dot blot experiment does not determine which Ser or Thr residues within a peptide were glycosylated, we cannot  Tables I, II, III, and IV. Both chromatograms were obtained under standard conditions, and the retention times shown in B apply to both chromatograms. Fraction numbers shown in A correspond to those mentioned in the text for concanavalin A blotting.
definitively conclude that O-glycosylation accounts for all of the modification of Ser or Thr residues in these peptides, but it must account for some.
Assignment of domain III as an IgV-like domain suggests that there may be additional Ig-like domains in the NH 2terminal region, because multiple sequential Ig domains are often present in members of the Ig superfamily. In members of the superfamily that are cell adhesion proteins, 2 to 5 sequential domains are common. These tandem domains are at the NH 2 termini of the mature proteins in the vast majority of cases (Williams and Barclay, 1988). Furthermore, the Ig fold appears to be more widespread than the Ig superfamily itself and proteins with little or no sequence similarity to Ig domains form Ig-like folds. Most of these proteins are involved in cell adhesion or protein-protein interaction (Holmgren et al., 1992;Overduin et al., 1995;Shapiro et al., 1995).
The 180 NH 2 -terminal residues of ␣-agglutinin 20 -351 are enough to form two more IgV domains, with the G strand of domain I being the A strand of domain II, as in CD4 (Fig. 9) (Williams and Barclay, 1988;Williams et al., 1989;Ryu et al., 1990;Wang et al., 1990;Barclay et al., 1993). A revised alignment procedure for ␣-agglutinin 20 -351 strongly supports a three-domain assignment (Fig. 9) . When the sequences of the three proposed domains were aligned with each other and with an IgV consensus based on predicted strand profile (Fig. 9) and hydrophobic moment (Eisenberg et al., 1984) (data not shown), there was high conformity to the consensus in all three domains (Table VII). Although there is a low degree of identity in the alignment, the conserved residues include many of the IgV consensus residues. The alignments shown scored significantly better (Z Ͼ 3) than did random sequences of the same composition. Residues in ␣-agglutininin domains I and II corresponding to the consensus positions for the IgV domains include a Cys residue in each domain (the F strand Cys in domain I and the B strand Cys in domain II) and Trp 55 corresponding to strand C of domains I. There are Met residues in all three proposed ␣-agglutinin domains in positions analogous to the conserved D-strand Arg in other IgV domains (residues 69, 158, and 274, Fig. 9). In IgV domains, an Asp residue at the beginning of the F strand forms a salt bridge with this Arg, which it could not do with the Met residue in the ␣-agglutinin. In the three proposed ␣-agglutinin domains, this Asp is also absent (residues 89, 176, and 293). Although the number of residues conserved among the three domain is low, the three sequences show about 40% similarity (Table VII). The conserved and identical residues are especially frequent at positions conserved in mammalian IgV domains ( Fig. 9 and Table VII).
The similarity of domains I and II is also consistent with apparent sequence homology by a standard method. Residues 30 -94 and 107-180 can be aligned with a Z score of 4.7 (GCG BESFIT, gap weight 3.0, length weight 0.0; Gribskov and Devereux, 1991). Such a score implies a common ancestral sequence and common structure for these regions, which correspond to strands B to F of domains I and II.
CD Spectra Are Consistent with Inclusion of ␣-Agglutinin in the Ig Superfamily-The CD spectrum of ␣-agglutinin 20 -351 was similar to those of other members of the Ig superfamily, showing little or no ␣-helix and a predominance of ␤-sheet. The magnitude of the negative peak at 217 nm characteristic of ␤-sheet was greater in ␣-agglutinin 20 -351 than in the spectrum

TABLE V
Sequences of Cys-containing peptides whose retention time was not affected by reduction HPLC-purified tryptic peptides (Fig. 6A) or S. aureus V8 peptides (chromatogram not shown) were sequenced, and the results are listed. Yield of each residue (pmol) is listed beneath each position. X represents missing residues, and the sequence deduced from the nucleotide sequence of the gene follows in lowercase letters. In each case, the isolated peptide could be labeled with P-2007 in the absence of reducing agent ( Fig. 7 and data not shown). of Igs themselves, but was in the range of that for many other members of the Ig superfamily (Cathou and Dorrington, 1975;Jefferis et al., 1978;Killeen et al., 1988). The CD profile of ␣-agglutinin 20 -351 is similar to those of MRC OX-45, CD4, Thy-1, and CD2 (Campbell et al., 1979;Killeen et al., 1988;Chamow et al., 1990;Recny et al., 1990). The mean residue ellipticity at 217 nm for ␣-agglutinin 20 -351 , Thy-1, and CD2 are Ϫ4.68 x 10 3 , Ϫ4.8 ϫ 10 3 , and Ϫ6.6 ϫ 10 3 degrees⅐cm 2 ⅐dmol Ϫ1 , respectively. The high ␤-sheet content of ␣-agglutinin 20 -351 is also close to that of silk fibroin (Demura and Asakura, 1991) and human plasma fibronectin (Oesterlund, 1988), both of which are mostly antiparallel ␤-sheet structures (65 and 79%, respectively), and may be close to the maximum possible ␤-sheet content. Such a high ␤-sheet content can only be accommodated in globular proteins by antiparallel structures. Therefore, the ␤-sheet content of ␣-agglutinin 20 -351 (70%) is among the highest for known proteins with essentially pure antiparallel ␤-sheet structures. The unusually high content of antiparallel ␤-sheet also implies the presence of antiparallel ␤-sheet structure throughout the molecule and is therefore consistent with the three-domain alignment. It is worth noting that, even if domain III were composed of pure antiparallel ␤-sheet structure (100% sheet), domain I and II would still have a ␤-sheet content of at least 50% to yield an overall ␤-sheet content of 70% in ␣-agglutinin 20 -351 . Therefore, ␤-sheet is the predominant structure in all of the domains. Domain III (residues 200 -326) was previously proposed to contribute to the binding site (Cappellaro et al., 1991;Lipke and Kurjan, 1992). Neither the purified ␣-agglutinin 155-351 fragment nor the unpurified Arg-C digest of ␣-agglutinin 20 -351 retained activity, despite the retention of most of the secondary structure in the cleaved product. The inactivity of the cleaved product implies that regions of domains I and/or II are also essential for binding. Such contributions of multiple domains to the binding site is the rule in the Ig superfamily, with few exceptions (Williams and Barclay, 1988).
Disulfide Bonds and Free Sulfhydryls in ␣-Agglutinin 20 -351 -Cys 97 and Cys 114 form an interdomain disulfide bond between the proposed COOH terminus of domain I and the NH 2 terminus of domain . Interdomain disulfides are known in other members of the Ig superfamily, including the lymphoid differentiation antigen CD33 (Simmons and Seed, 1988), the B cell adhesion molecule CD22 (Stamenkovic and Seed, 1990) and the myelin-associated glycoprotein (Pedraza et al., 1990), but ␣-agglutinin is unique in the position of the bond between the F and B strands on sequential domains.
There are four cysteine residues in domain III, in the A, B, CЈ, and F strands. Intradomain disulfide linkages in Ig-like domains often form between cysteines of the B and F strands (Williams and Barclay, 1988). Although Cys 227 and Cys 300 are aligned in positions for the consensus intradomain disulfide bond, Cys 202 in strand A and Cys 300 in strand F form the actual disulfide linkage. The position of the disulfide Cys residues is not as highly conserved in the Ig superfamily as it is in the antibodies themselves. In domain I of myelin-associated glycoprotein, residues in strands B and E of the IgV domain form an intrasheet disulfide linkage (Pedraza et al., 1990). In domain II of CD4, there is a disulfide between strands C and F (Ryu et al., 1990;Wang et al., 1990). Thus, the bond between the A and F strands in domain III of ␣-agglutinin is a new position for intradomain disulfides in the Ig superfamily. These strands are close enough to allow formation of the bond .
Cys 227 in strand B and Cys 256 in strand CЈ of domain III of ␣-agglutinin 20 -351 are free sulfhydryls and can be derivatized FIG. 8. Summary of sequenced ␣-agglutinin 20 -351 peptides. Regions sequenced from with tryptic and S. aureus V8 peptides are underlined with solid or wavy lines, respectively. Sulfhydryl groups are labeled (SH) and disulfide bonds are marked. Identified O-linked glycosylation sites are marked (solid diamonds). Potential N-glycosylation sites are italicized and stricken out; the two identified N-glycosylation sites are marked (stacked solid diamonds).

TABLE VI
Identification of glycosylated residues in ␣-agglutinin 20 -351 Tryptic or S. aureus V8 peptides were isolated and sequenced. Listed sequences had residues missing that were predicted to be Asn, Ser, or Thr. X represents missing residues, and the sequence deduced from the nucleotide sequence of the gene follows in lowercase letters. under nonreducing conditions. However, they appear not to be exposed to solvent, since they were derivatized only under denaturing conditions (data not shown). A free sulfhydryl is present in at least one other members of the Ig superfamily. CD8␣ has a single IgV domain with three Cys residues, one of which was in the reduced state in the crystal structure (Leahy et al., 1992). As in ␣-agglutinin, all Cys residues are buried in the interior of the domain. Glycosylation in ␣-Agglutinin 20 -351 -␣-Agglutinin is both Nand O-glycosylated (Terrance et al., 1987;Wojciechowicz et al., 1993;Lu et al., 1994). Our N-glycosylation results conform to the finding that Asn-Xaa-Thr sequences are preferred over Asn-Xaa-Ser as N-glycosylation sites in yeast (Moehle et al. 1987; however, see Riederer and Hinnen (1991)), in that of the three sequenced sites, the two Asn-Xaa-Thr sequences were glycosylated and the Asn-Xaa-Ser sequence was not. The sites of N-glycosylation, between ␤-strands C and CЈ and between strands F and G of domain III, are common in members of the Ig superfamily Dwek et al., 1993).
There is at least one other N-glycosylated residue in ␣-agglutinin 20 -351 . Endo H treatment converts the 21-kDa Arg-C digestion fragment to the 16-kDa fragment, so Asn 79 , Asn 109 , or Asn 135 must be glycosylated. The 5-kDa size difference would accommodate less than 30 carbohydrate residues, the equivalent of a single N-linked chain in yeast (Hames, 1990;Klis, 1994). The glycosylated residue is probably Asn 109 , because it is the only Asn-Xaa-Thr sequence in this part of the molecule, and we have repeatedly failed to obtain the sequence from this residue (peptides T1, DT1, and DS2). O-Glycosylation is common for cell surface proteins, with O-linked oligosaccharides often in Ser/Thr-rich regions. Many known cell surface O-glycosylated proteins, like low density lipoprotein receptor (Goldstein et al., 1985), decay-accelerating factor (Reddy et al., 1989), the muscle-specific isoform of N-CAM (Walsh et al., 1989), and yeast Gas1p/Gpp1p (Gatti et al., 1994) contain clusters of Ser/Thr enrichment segments in the regions proximal to the membrane. Expression of low density lipoprotein receptor and decay-accelerating factor in mutant cells defective for O-glycosylation result in a rapid cleavage of the binding region from the extracellular surface (Kozarsky et al., 1988;Reddy et al., 1989). In ␣-agglutinin, the region rich in hydroxy amino acids extends from about residue 300 (the Fstrand Cys of domain III) to the COOH-terminal signal for GPI anchor addition at approximately residue 627 Kodukula et al., 1993;Wojciechowicz et al., 1993).
␣-Agglutinin expressed in the presence of tunicamycin, which inhibits N-glycosylation, reacts with ConA, indicating the presence of O-linked mannose residues (Terrance et al., 1987). This binding is not due to reaction with modified GPI anchors, because truncated fragments of ␣-agglutinin lacking the GPI anchor signal also bind ConA (Terrance et al., 1987;Hauser and Tanner, 1989;Wojciechowicz et al., 1993). The pattern of O-glycosylation in ␣-agglutinin 20 -351 indicates that there are multiple sites glycosylated after residue 282, which is FIG. 9. Alignment of three domains of ␣-agglutinin with each other and with a consensus sequence for IgV domains (Williams and Barclay, 1988). The positions of the ␤-strands in the consensus sequence are shown. The alignment is based on secondary structure prediction and alignment within prospective ␤-strands, with gaps allowed only between strands (Chou and Fasman;Lipke et al., 1995). The sequence between residues 101 and 110 is repeated as the G strand of domain I and the A strand of domain II, as discussed in the text. Identities are boxed and shaded, similarities are boxed without shading. Similarity  at the NH 2 -terminal end of the E strand of domain III. O-Glycosylation is predicted to continue through the Ser/Thr-rich sequence which extends to about residue 620. Six additional Asn-Xaa-Thr sequences in this Ser/Thr-rich region are probably glycosylated based on molecular size of truncated ␣-agglutinin species before and after treatment with endo H (Wojciechowicz et al., 1993). This highly glycosylated region (residues 300 -627) would form a "stalk" holding the active site out from the wall surface, consistent with electron micrographs (Jentoft, 1990;Cappellaro et al., 1994). Finally, the stalk is predicted to continue to the COOH-terminal GPI anchor, which is processed in vivo to allow linkage to cell wall polysaccharides (Lu et al., 1994(Lu et al., , 1995. A drawing of ␣-agglutinin shows three sequential Ig domains, with N-glycosylation in sites common for such domains (Fig. 10). The binding site includes residues in domain III and at least one other region. The disulfide bonds between domains I and II and between the A and F strands in domain III are unique among Ig domains, and there are two free sulfhydryls in domain III. Following the Ig domains, there is a heavily N-and O-glycosylated stalk sequence, and the COOH-terminal of the protein is initially GPI anchored. Therefore ␣-agglutinin has a structure that recapitulates many of the features of cell adhesion proteins in multicellular eukaryotes. FIG. 10. Structure of ␣-agglutinin. The standard "C"-shaped models of Ig domains are shown, with the B and F strand Cys residues at the points of the C (Williams and Barclay, 1988). The first two domains are fused to designate the shared strand. Cys residues are shown in their approximate positions, as are N-glycosylation sites at Asn 248 and Asn 306 . N-Glycosylation sites COOH-terminal to Asn 348 have the sequence Asn-Xaa-Thr and are assumed to be used, based on the sizes of truncated forms of ␣-agglutinin (Wojciechowicz et al., 1993). Another possible N-glycosylation site at Asn 109 is not shown. Only representative O-glycosylations are shown.