Identification of Disulfide Bonds among the Nine Core 2 N -Acetylglucosaminyltransferase-M Cysteines Conserved in the Mucin (cid:1) 6- N -Acetylglucosaminyltransferase Family*

Bovine core 2 (cid:1) 1,6- N -acetylglucosaminyltransferase-M (bC2GnT-M) catalyzes the formation of all mucin (cid:1) 1,6- N -acetylglucosaminides, including core 2, core 4, and blood group I structures. These structures expand the complexity of mucin carbohydrate structure and thus the functional potential of mucins. The four known mucin (cid:1) 1,6- N -acetyl-glucosaminyltransferases contain nine conserved cysteines. We determined the disulfide bond assignments of these cysteines in [ 35 S]cysteine-labeled bC2GnT-M isolated from the serum-free conditioned medium of Chinese hamster ovary cells stably transfected with a pSecTag plasmid. This plasmid contains bC2GnT-M cDNA devoid of the 5 (cid:1) sequence coding the cytoplasmic tail and transmembrane domain. The C18 reversed phase high performance liquid chromatographic profile of the tryptic peptides of reduced-alkylated alties corresponding to Standard, Restrictive or Permissive parameters were used to scan the structural data base. Matchmaker and SYBYL graphical interfaces were used to analyze results. Finally, the Biopoly- mer module of SYBYL was used to build and analyze structural model of the bC2GnT-M molecule.

There are two types of mucins (1,2): secreted and membranebound. MUC2, MUC5AC, and MUC5B are representatives of secreted mucins, whereas MUC1, MUC4, leukosialin, and Pselectin glycoprotein ligand-1 are examples of membranebound mucins (3,4). Secreted mucins are produced by epithelial mucus cells and play important roles in the rheological and bacteria-binding properties of the mucus covering the epithelial tissues (5,6). Membrane-bound mucins are found at the cell surface throughout the body (3,4). They can modulate immune functions, such as maturation of B cells and trafficking of leukocytes during inflammatory response (3,4). The biological properties of both secreted and membrane-bound mucins are attributed to the structurally heterogeneous carbohydrates covalently bound to the peptide backbones.
These ␤1,6GlcNAc transferase (␤6GnT) 1 isozymes differ by their nucleotide and amino acid sequences, tissue distribution, and the carbohydrate structures they are able to form (9). Despite these differences, all ␤6GnTs contain nine conserved cysteines (9,12). In an effort to elucidate the structural deter-minants that distinguish the difference in substrate specificity among members of this gene family, we characterized the disulfide linkages formed among these nine conserved cysteines in bC2GnT-M. To facilitate the effort, we generated a secreted form of the recombinant bC2GnT-M by removing the N-terminal region that contains the cytoplasmic tail and transmembrane domain and cloned the cDNA into pSecTag2B, which contains Ig -chain leader sequence at the N terminus and Myc epitope and polyhistidine tag at the C-terminal end. By microsequencing of the [ 35 S]cysteine-containing tryptic peptides separated by reversed phase high performance liquid chromatography (RP-HPLC), we identified four cystine pairs between first and sixth, third and seventh, fourth and fifth, and eighth and ninth cysteine residues. The second cysteine was not conjugated. This pattern of disulfide bond distribution is different from that of mouse C2GnT-L recently reported (12). The results indicate that the conservation of nine cysteines does not lead to the formation of same disulfide bonds between different isozymes, suggesting that other factors such as secondary structures may play a crucial role in determining the formation of disulfide bonds and substrate specificity. Molecular modeling using distribution of disulfide bonds and fold recognition/threading method to search the Protein Data Bank showed a match with the crystal structure of aspartate aminotransferase (20,21). This template permits proper spatial arrangement of the cysteines involved in the formation of the four cystine pairs determined for bC2GnT-M. The structure is different from either the glycosyltransferase B-fold structure proposed for mouse C2GnT-L (12,22) or glycosyltransferase A-fold, the major protein fold proposed for glycosyltransferases (22,23 (17) by PCR using 5Ј and 3Ј primers containing EcoRI and KpnI restriction sites, respectively. The PCR product was cloned into a Pichia vector (pPIC6␣C) (Invitrogen) first and then transferred via EcoRI and NotI sites to pSecTag 2B vector, which contains Ig -chain at the N terminus and a Myc and a His 6 tag at the C terminus. After confirmation by sequencing and then enzyme activity assay of the recombinant protein following a transient transfection of CHO cells by published methods (24), the pSecTag2B-bC2GnT-M was used to generate stable clones in CHO cells as described next.
After they were cultured in Ham's F-12 medium plus 10% fetal bovine serum to 70% confluence, the CHO cells were transfected under serum-free conditions with pSecTag2B-bC2GnT-M delivered with Lipofectin supplemented with insulin as previously described (24). Two days later, cells were split 1:4 and cultured in Ham's F-12 medium plus 10% fetal bovine serum and 300 g/ml Zeocin (Invitrogen). After 10 days, 24 clones were picked and characterized for C2GnT activity. The clone that expressed highest C2GnT activity after four passages was used for the current study.
Assay of C2GnT-L, C4GnT-M, and IGnT Activities of Recombinant bC2GnT-M-The recombinant bC2GnT-M generated by the CHO cells stably transfected with pSecTag bC2GnT-M was assayed for C2GnT-L, C4GnT-M, and IGnT activities in the cells and conditioned medium as previously described (17). The conditioned medium was first concentrated 10-fold at 4°C by centricon filtration with a 30-kDa molecular weight cut-off membrane (Millipore Corp.).
Metabolic Labeling of bC2GnT-M-The [ 35 S]cysteine (or [ 35 S]methionine)-labeled bC2GnT-M was prepared from CHO cells stably transfected with pSecTag2B-bC2GnT-M as follows. The CHO cells that had grown in T-75 flasks to 90% confluence in Medium A (see "Cell Culture") were switched to 12 ml of serum-free Medium A containing 2 mM sodium butyrate and cultured for 6 -7 h. Then the cells were exposed for 1 h to Dulbecco's modified Eagle's medium (catalog no. 21013-024; Invitrogen) supplemented with 2 mM L-glutamine, 1 mM sodium pyruvate, 15 g/ml methionine (for preparing [ 35 S]cysteine-labeled bC2GnT-M) (25) or 24 g/ml L-cysteine-HCl (for preparing [ 35 S]methionine-labeled bC2GnT-M), and 2 mM butyrate. Following the addition of 63 l of [ 35 S]cysteine-HCl (11 mCi/ml at 1,075 Ci/mmol) or 22 l of [ 35 S]methionine (10 mCi/ml at 540 Ci/mmol) (ICN) to each T-75 flask and incubated for 1-2 h, the medium was replaced with serum-free Medium A containing 2 mM butyrate. After the cells were cultured for 24 -48 h, the conditioned medium was harvested, centrifuged at 1,000 ϫ g for 5 min to remove cell debris, and used for purification.
Purification of 35 S-labeled bC2GnT-M-The 35 S-labeled bC2GnT-M was purified from the combined supernatant in a two-step process. First, the medium was concentrated at 4°C from 180 to 10 ml using an SCHEME 1 Disulfide Bonds of the Nine Conserved Cysteines in C2GnT-M Amicon YM 30 centricon (Amicon Bioseparations Centriprep; Millipore) in a centrifuge (Jouan model MR 22i) at 1,500 ϫ g for 30 min. Two ml of nickel-nitrilotriacetic acid metal affinity resin (Qiagen), which has a 5-10-mg protein-binding capacity/ml of resin, was added to the concentrated medium. After a gentle shaking at 4°C overnight, the resin was packed in a column. Following successive rinsing of the packed resin with the supernatant twice, 10 ml of 10 mM imidazole (pH 8.0), 10 ml of 20 mM imidazole (pH 8.0), and 10 ml of 20 mM imidazole (pH 6.2), the protein was eluted with 300 mM imidazole buffer (pH 8.0) (26) and collected in 1.5 ml/fraction.
Polyacrylamide Gel Electrophoresis and Western Blot Analysis-The purity of the recombinant bC2GnT-M purified by a nickel-nitrilotriacetic acid column was analyzed by SDS-10% PAGE under reduced conditions followed by Coomassie Blue stain or Western blotting using anti-Myc antibody (1:500) (Invitrogen). The anti-Myc antibody-treated membrane was further treated with horseradish peroxidase-conjugated secondary antibody (1:1000) (Invitrogen) and developed with ECL (Amersham Biosciences).
Alkylation and Reduction-Alkylation of Recombinant bC2GnT-M-To prepare bC2GnT-M with free cysteines alkylated, 20-25 g of recombinant protein in 1.5 ml of elution buffer in a silanized tube was treated with 10 mM iodoacetamide in the dark at 37°C for 30 min (27). To prepare bC2GnT-M with all cysteines alkylated, the same amount of the recombinant bC2GnT-M was treated first with 15 mM dithiothreitol under argon gas at 37°C for 2 h and then 10 mM iodoacetamide for 30 min.
Trypsin Digestion of bC2GnT-M-Both alkylated and reduced-alkylated bC2GnT-M (90 g in 1.5 ml of elution buffer adjusted to pH 8.0 with 1 M Tris-HCl buffer in silanized polypropylene tubes) were digested for 16 h with 50 g of diphenylcarbamyl chloride-treated trypsin in 50 -150 mM Tris-HCl (pH 8) containing 5 mM CaCl 2 (25,27). Digestions continued for 4 h after the addition of another 50 g of trypsin (67 g/ml final concentration), which was maintained at pH 8.0 with 1 M Tris-HCl, pH 8.0. Samples were then centrifuged (1,500 ϫ g), and the supernatant was kept at 4°C prior to HPLC separation of the tryptic peptides.
Tryptic Mapping Strategy-The tryptic mapping strategy consisted of three steps (25)(26)(27). First, the [ 35 S]cysteine-containing bC2GnT-M was fully reduced and alkylated and then digested with trypsin. The tryptic peptides were separated by C18 RP-HPLC. Those HPLC fractions containing cysteine were identified by virtue of their radiolabel and pooled, and the attendant peptides were identified by Edman degradation. In the second step, the [ 35 S]cysteine-labeled bC2GnT-M was digested with trypsin without prior reduction and alkylation. The tryptic digests were subjected to chromatography on C8 RP-HPLC. In the third step, each [ 35 S]cysteine-containing peak from C8 chromatographic profile was reduced, alkylated, and rechromatographed by C18 RP-HPLC. The cysteines involved in cystine pairing were identified by comparing the profile with that obtained in step one. In this study, [ 35 S]methionine labeling was also employed to identify the peptides that contain both cysteine and methionine.

RP-HPLC Separation of Tryptic Peptides from Reduced-Alkylated bC2GnT-M Labeled with [ 35 S]Cysteine or [ 35 S]
Methionine-C18 (0.46 ϫ 25 cm) (Vydac; 300 Å, 5 m) column was used for establishing the profile of fully reduced and alkylated tryptic peptides first. It was then used for identification of the cysteines involved in cystine pairing. Tryptic peptides prepared from reduced-alkylated bC2GnT-M labeled with [ 35 S]cysteine or [ 35 S]methionine were injected onto a C18 column equilibrated with 0.1% trifluoroacetic acid (buffer A) at 42°C. The column was eluted isocratically at 1 ml/min for 3 min with buffer A followed by an acetonitrile gradient at 0.32%/min for 100 min, 4.2%/min for 15 min and then re-equilibrated with buffer A. One-minute fractions were collected in silanized polypropylene tubes containing 4.5 g of myoglobin/tube as carrier (25,27). The fractions were monitored by liquid scintillation counting. The fractions containing 35  S-containing fractions collected and analyzed as described above were concentrated by Speed-Vac, reconstituted in 1.5 ml of 150 mM Tris-HCl (pH 8.4) containing 20 mM dithiothreitol under argon gas, and incu-bated at 37°C for 3-5 h. Then free thiols were alkylated with 15 mM iodoacetamide under subdued light (26,27). The carboxymethylated [ 35 S]cysteine (or methionine)-containing peptides were then separated by C18 RP-HPLC as described above.
Amino Acid Sequencing-[ 35 S]Cysteine-or [ 35 S]methionine-labeled tryptic peptides were concentrated to less than 50 l using a Speed-Vac concentrator (Savant). Each sample was loaded onto a Polybrenecoated, trifluoroacetic acid-treated cartridge filter (Applied Biosystems) and sequenced using a pulse liquid protein sequencer (Applied Biosystems model 477A). After each cycle of Edman degradation, the released amino acid derivatives were collected and analyzed by liquid scintillation counting to determine the position(s) of radiolabeled cysteine or methionine in each peptide (28). Amino acid sequencing was performed in the protein sequencing facility at the University of Nebraska Medical Center (Omaha, NE).
Bio-Gel P-4 Column Chromatography-A Bio-Gel P-4 (200 -400mesh) column (1 ϫ 50 cm) was employed to separate the two cysteinecontaining peptides, which co-eluted at peak a (see Fig. 2) of the C18 RP-HPLC chromatogram of the tryptic peptides prepared from reducedalkylated bC2GnT-M. The column was eluted with water at 1 ml/min and collected at 1 ml/fraction. Fractions were analyzed by liquid scintillation counting to localize the [ 35  Fold Recognition and Molecular Modeling of bC2GnT-M-Due to the lack of appropriate templates (with sequence similarity greater than 30%) for homology modeling, the "inverse folding" approach (29) was used to determine a set of known three-dimensional protein structures, which were compatible with our sequence of interest. The Matchmaker module of SYBYL 6.8 software package (TRIPOS, Inc., St. Louis, MO) was utilized to find crystal structures from the RCSB Protein Data Bank (available on the World Wide Web at www.pdb.org) with threedimensional folds that match structural properties of the sequence of bC2GnT-M. Matchmaker examines propensities of amino acid residues from the protein sequence to be in a certain environment (solventexposed or buried), finds the optimal alignment (frozen or thawed mode) of the sequence to the "structural fingerprint" describing the threedimensional environment at each residue position, and estimates pseudoenergy scores for different protein folds. Three sets of gap pen- alties corresponding to Standard, Restrictive or Permissive parameters were used to scan the structural data base. Matchmaker and SYBYL graphical interfaces were used to analyze results. Finally, the Biopolymer module of SYBYL was used to build and analyze structural model of the bC2GnT-M molecule.

RESULTS
Purification and Characterization of the Recombinant bC2GnT-M Secreted into the Medium-We found that the recombinant enzyme secreted into the medium was fully active. However, the relative activity of the recombinant bC2GnT-M toward the three acceptors, core 1, core 3, and blood group i oligosaccharides, was changed from 0.7/1.0/0.4 in the wild-type bC2GnT-M (17) to 6.0/1.0/1.0 in the recombinant bC2GnT-M. Treatment with dithiothreitol (2.5 mM) and ␤-mercaptoethanol (10 mM) did not affect the enzyme activity. The yield of the recombinant C2GnT-M isolated from the serum-free conditioned medium by nickel-nitrilotriacetic acid affinity column was about 1.5 g/ml. Coomassie Blue staining of the SDS-PAGE gel of the purified recombinant showed a single band of about 58 kDa (Fig. 1), which was larger than the calculated molecular mass (52,479 Da) of the recombinant protein.
Western blot analysis using an anti-Myc antibody also showed one band. Treatment of the purified enzyme with N-glycanase with or without sialidase A plus O-glycanase decreased the size of the recombinant protein by about 4 -5 kDa, suggesting that the recombinant protein was Nglycosylated at one or both of the two potential N-glycosylation sites, N-72 and N-108 (17). The lack of apparent change in size after treatment with sialidase A plus O-glycanase suggests either the absence or presence of a small amount of O-glycan T antigen with or without sialic acid in the recombinant bC2GnT-M.

RP-HPLC Tryptic Map of Recombinant bC2GnT-M Labeled with [ 35 S]Cysteine or [ 35 ]Methionine-The recombinant
bC2GnT-M contains 10 cysteines, of which nine are conserved among all members of the ␤6GnT family (9). The amino acid sequences of these 10 cysteine-containing tryptic peptides in the recombinant bC2GnT-M are listed in Table I (column 3). Analysis of these peptides by localization of the position of the [ 35 S]cysteine residue in each peptide by microsequencing followed by liquid scintillation counting could identify only nine (peaks a-i) of the 10 expected cysteine-containing peptides (Fig. 2). The radiolabeled peaks that could not be identified may represent incompletely cleaved tryptic peptides. The peptide that contained cysteine at the 17th position was not detected, due probably to inhibition of Edman degradation reac-tion by the proline at the amino-side of the cysteine. This peptide was also identified as peak I from the C8 RP-HPLC tryptic peptide map of alkylated bC2GnT-M prepared under nonreduced conditions (Fig. 3A). This peak I peptide had the same retention time as that of peak a shown in Fig. 2 after rechromatography on C18 column (Fig. 3B). The result suggested that peak a had two cysteine-containing peptides, one of them having a cysteine at the second position and the other one having a cysteine at the 17th position. This prediction was verified by column chromatography of peak a material in Fig. 2 on Bio-Gel P4, which separated peak a materials into peaks a 1 and a 2 (Fig. 2, inset). The a 2 peak, which was the smaller of the two, was confirmed to be the peptide that had cysteine at the second position by microsequencing. To positively verify the identity of the peak a 1 peptide, which contained a methionine at

FIG. 2. C-18 RP-HPLC separation of 35 S-labeled tryptic peptides of reduced and alkylated bC2GnT-M metabolically labeled with [ 35 S]cysteine.
Nickel-nitrilotriacetic acid affinity columnpurified bC2GnT-M labeled with [ 35 S]cysteine was reduced, alkylated, trypsinized, and then separated on a Vydac C18 column with an acetonitrile gradient described under "Experimental Procedures." The 35 S-labeled peptide peaks are designated as a-i according to the retention times. The identity of each peptide was determined by the position of the 35 S label recovered after each Edman degradation cycle and then measured by liquid scintillation counting. Peak a contains two 35 S-labeled peptides, which were separated into peak a 1 and peak a 2 by Bio-Gel p4 column chromatography and then analyzed by liquid scintillation counting. The chromatographic profile is shown in the inset. position 10, a C18 RP-HPLC tryptic peptide map was generated from the reduced and alkylated bC2GnT-M metabolically labeled with [ 35 S]methionine (Fig. 4A). By amino acid sequencing and then liquid scintillation counting, the peak a 1 material (Fig. 4A) was shown to be the tryptic peptide that contains methionine at position 10, indicating that peak a in Fig. 2 was a mixture of two peptides, one having a cysteine at the second position (a 2 ) and one having a cysteine at the 17th position (a 1 ).  Fig. 3A were further reduced, alkylated, trypsinized, and run on C18 (B-G). HPLC fractions were collected and monitored by liquid scintillation counting. Peak II corresponds to peak c in Fig.  2. Peak III generated peaks e and f, peak IV produced peaks a 1 and g, peak V yielded peaks d and h, and peak VI formed peaks b and i. labeled peak was subjected to C18 RP-HPLC after reduction and alkylation to identify the cysteines involved in the formation of each cystine pair. As shown above, the cysteine in peptide a 2 was not involved in disulfide bond formation, because rechromatography of peak I material from the C8 tryptic peptide map (Fig. 3A) generated only a single peak (a 2 ) on the C18 column (Fig. 3B). Also, the cysteine in peptide c (Fig. 2) was not involved in disulfide bond formation, because rechromatography on the C18 column of peak II material eluted from the C8 column from alkylated bC2GnT-M (Fig. 3A) yielded only a single peak (c) (Fig. 3C). The identity of peak c material, Cys 55 -Arg 56 , was further confirmed by C18 RP-HPLC chromatography of alkylated Cys-Arg standard (data not shown). The four disulfide bonds were identified by rechromatography of peaks III-VI in Fig. 3A on the C18 column after reduction, alkylation, and trypsinization. Peak III yielded peaks e and f (Fig. 3D), indicating that these two cysteine-containing peptides were S-S-bound. Similarly, peak IV produced peaks a 1 and g (Fig. 3E), and peak V generated peaks d and h (Fig. 3F), whereas peak VI formed peaks b and i (Fig. 3G). The disulfide bridge between peptides e and f, which contained methionine (Table I and Fig. 4A), was further confirmed by rechromatography on the C18 column (Fig. 4C) of peak III material from C8 column fractions (Fig. 4B). These fractions were produced from trypsinization of alkylated bC2GnT-M labeled with [ 35 S]methionine. Cysteine 55, which is not a conserved cysteine, was not involved in disulfide bridge formation. Among the nine cysteines conserved in every member of the mucin ␤6GnT family, the second cysteine (Cys 113 ) is the only one not involved in disulfide bridge formation. The four disulfide bridges were formed between the first (Cys 73 ) and sixth (Cys 230 ), the third (Cys 164 ) and seventh (Cys 384 ), the fourth (Cys 185 ) and fifth (Cys 212 ), and the eighth (Cys 393 ) and ninth (Cys 425 ) cysteine residues. The radiolabeled peaks in Fig. 4 may be incompletely cleaved tryptic peptides.

Identification of Free Cysteine and Disulfide-bonded Cystine
Molecular Modeling-Initially, we attempted to generate a three-dimensional structure of bC2GnT-M based on the templates of GT-A and GT-B folds (22,23). However, neither protein fold could produce a structure that would allow the placing of cysteines in close proximity amendable to the formation of the disulfide bonds. We proceeded to search the Protein Data Bank for crystal structures that could accommodate the threading model of bC2GnT-M and the four disulfide bonds determined in our study. After carrying out Matchmaker runs (with Standard, Restrictive, or Permissive parameters and frozen or thawed alignments), we identified 10 crystal structures that showed the best pseudoenergy estimates. Three crystal structures, including 1GOX (glycolate oxidase) (30), 1ELS (enolase) (31), and 2CST.A (aspartate aminotransferase) (20), were found to be among the best scored proteins for each run. The spatial arrangement of the eight cysteine residues that were predicted to form the four disulfide bridges was taken as a criterion for further selection of the structural template to conduct molecular modeling. By this criterion, the only protein structure with a three-dimensional fold that demonstrated spatial proximity of all cysteines involved in the formation of disulfide bridges was a crystal structure of the chicken cytosolic aspartate aminotransferase (2CST.A). Among the high scored proteins, another crystal structure of the chicken mitochondrial mutant (K258H) aspartate aminotransferase 1AKA (21) had an even better positioning of the corresponding cysteines. Therefore, crystal structures of the aspartate aminotransferase 2CST.A and 1AKA were used as templates for molecular modeling of the bC2GnT-M. bC2GnT-M to locate at the proper locations amendable for the formation of the four cysteine pairs (Fig. 5B). The molecular model of the fragment (aa 48 -440) for bC2GnT-M, which was constructed based on the aspartate aminotransferase template and the formation of the four disulfide bridges, is shown in Fig. 6. DISCUSSION We have determined the disulfide bonds among the nine bC2GnT-M cysteines conserved in the mucin ␤6GnT family members (9). We employed the strategy of identifying cysteine-containing peptides by locating the [ 35 S]cysteine in the tryptic peptides of [ 35 S]cysteine-labeled bC2GnT-M by amino acid sequencing. We found that the nine conserved cysteines in bC2GnT-M had a different pattern of disulfide bond distribution compared with those of mouse C2GnT-L recently reported (12). In this study, HPLC followed by mass spectrometry was employed to determine the distribution of disulfide bonds. In mouse C2GnT-L, the sixth conserved cysteine was free, whereas it was the second conserved cysteine in bC2GnT-M that was a free thiol. The conserved cysteines involved in the formation disulfide bonds were the first and ninth, the second and fourth, the third and fifth, and the seventh and eighth cysteine residues for mouse C2GnT-L and the first and sixth, the third and seventh, the fourth and fifth, and the eighth and ninth cysteine residues for bC2GnT-M. Therefore, sharing of nine conserved cysteines does not necessarily form the same disulfide bridges. A similar observation has also been reported for ␣(1,3/1,4)-fucosyltransferase III and VII (32,33), in which different disulfide bonds were formed from the four cysteines conserved in these two fucosyltransferases. The results suggest that other factors, such as secondary structures and protein folds, play a crucial role in directing the formation of disulfide bonds.
The formation of different disulfide bonds between mouse C2GnT-L and bC2GnT-M would probably determine the difference in substrate specificity. Mouse C2GnT-L acts only on the core 1 acceptor, whereas bC2GnT-M can act on core 1, core 3, and blood group i acceptors. It will be of interest to know whether the pattern of disulfide bridge distribution of C2GnT-3, which exhibits same substrate specificity as that of C2GnT-L, is the same as that of C2GnT-L or different from those of C2GnT-L and C2GnT-M. As we previously pointed out (9), different ␤6GnT isozymes share 39 -52% amino acid sequence identity, whereas the same ␤6GnT isozyme from different animal species displays a 81-86% sequence identity. Characterization of the disulfide bond distribution of C2GnT-L from species other than mouse, and that of C2GnT-3 from different species should provide important clues for the identification of factors that direct the formation of disulfide bonds of the nine cysteines conserved among the mucin ␤6GnT family members (9,12).
As recently reviewed by Coutinho et al. (22), over 7,200 glycosyltransferase (GT)-related sequences are in the data banks. There has been no consensus structure among these glycosyltransferases (22,23). To date, each glycosyltransferase is defined by the nucleotide sugar donor, the acceptor, and the product. The recent explosion of the number of new glycosyltransferase sequence data in the postgenomic era has outpaced the speed of classical biochemical characterization of new glycosyltransferases. As a result, the identity of most of the putative glycosyltransferases remains unestablished. In an attempt to group these glycosyltransferases based on sequence similarity, 65 GT families have been identified (22). However, there is an inherent limitation in this approach because the specificity of glycosyltransferases is determined by three-dimensional structure. To date, there are only 11 glycosyltransferases of which crystal structures have been solved (22)(23)(24), indicating that only limited inference may be drawn from the data base. Despite the limitation, the structures of these glycosyltransferases have been used for proposing two GT superfamilies, GT-A and GT-B (22,23). The GT-A family, which includes eight of these glycosyltransferases, contains two tightly associated and abutting ␤/␣/␤ domains that form continuous central sheet of at least eight ␤-strands. The GT-A fold has also been described as a single domain fold. On the other hand, the GT-B fold contains two loosely associated Rossmann-like ␤/␣/␤ domains facing each other forming an in between space to accommodate the sugar donor and acceptor (34). These two protein folds have been the model structures to which the new GT structure has been compared. It was observed that the nucleotide sugar binding site was located at the N-terminal domain of the GT-A enzymes but at the C-terminal domain of the GT-B enzymes, whereas the acceptor binding site was on the other domain. It should be noted that the DXD motif, which was considered a signature sequence for GT-A enzymes (35), was found at a similar frequency in GT-A (71%) and GT-B (69%) enzymes (22). Therefore, the DXD motif itself could not be used as a reliable marker for identifying GT-A enzymes. Other factors that can be used as more reliable predictors of glycosyltransferase structure need to be developed.
Since the secondary structures lack the predictability of the three-dimensional structure, which is an important determinant of the conformation of a glycosyltransferase, the disulfide bridges coupled with the secondary structures may provide the best basis for predicting the structure of a glycosyltransferase in the absence of a crystal structure. Using this structure prediction strategy, Yen et al. (12) found that mouse C2GnT-L fit the GT-B fold, which could accommodate all the cysteines at the spatial locations enabling the formation of the cystine pairs identified. However, neither GT-A nor GT-B fold could accommodate all of the conserved bC2GnT-M cysteines participating in disulfide bond formation at the locations, which make the formation of disulfide bonds feasible. Instead, the three-dimensional structure of the chicken aspartate aminotransferase (K258H) mutant (21) provides the best fit for bC2GnT-M. This is one example showing that a protein fold other than the GT-A and GT-B folds derived from the crystal structures of 11 glycosyltransferases may exist for other glycosyltransferases. However, the significance of the high degree of the proposed threedimensional structural similarity between the chicken asparate aminotransferase and bC2GnTM is not clear at the present time. These two enzymes catalyze enzymatic reactions by different mechanisms (e.g. ping-pong mechanism for aspartate aminotransferase (20) and sequential mechanism for bC2GnTM (13)). An apparent modification of the aspartate aminotransferase was detected during catalysis but none for bC2GnTM. Despite these differences, the x-ray crystallographic structure of the aspartate aminotransferase could help guide further structural characterization of bC2GnTM. These results plus that of Yen et al. (12) suggest that two C2GnT isoenzymes with nine conserved cysteines may have distinct three-dimensional structures. The difference in three-dimensional structures between these two isozymes may provide an explanation for the difference in acceptor specificity. Confirmation of these proposed structures would await the determination of x-ray crystal structures of these two ␤6GnT isozymes.