Identification of the Carbohydrate Moieties and Glycosylation Motifs in Campylobacter jejuni Flagellin*

Flagellins from three strains of Campylobacter jejuni and one strain of Campylobacter coli were shown to be extensively modified by glycosyl residues, imparting an approximate 6000-Da shift from the molecular mass of the protein predicted from the DNA sequence. Tryptic peptides from C. jejuni 81–176 flagellin were subjected to capillary liquid chromatography-electrospray mass spectrometry with a high/low orifice stepping to identify peptide segments of aberrant masses together with their corresponding glycosyl appendages. These modified peptides were further characterized by tandem mass spectrometry and preparative high performance liquid chromatography followed by nano-NMR spectroscopy to identify the nature and precise site of glycosylation. These analyses have shown that there are 19 modified Ser/Thr residues in C. jejuni 81–176 flagellin. The predominant modification found on C. jejuni flagellin was O- linked 5,7-diacetamido-3,5,7,9-tetradeoxy- L - glycero- L -manno -nonulosonic acid (pseudaminic acid, Pse5Ac7Ac) with additional heterogeneity conferred by substitution of the acetamido groups 2 NMR experiments done at 25 °C with the HOD resonance set at 4.77 ppm. All experiments were performed as described before For the analysis of the NOE data, coordinates for 5-acetamido-7-acetamido-3,5,7,9-tetradeoxy- L - glycero - (cid:4) - L - manno nonulosonic acid using MM3(92) (QCPE) and InsightII (Molecular Simulations Inc.). The dihedral angles the C6-C7-C8-C9 bonds

Campylobacter spp. are among the most frequent causative agents of bacterial diarrhea worldwide and the leading cause of food-borne illness in North America (1,2). Motility is an essential virulence determinant required for colonization of the gastrointestinal tract and invasion of intestinal epithelial cells in vitro (3). Moreover, Campylobacter jejuni flagellin is the immunodominant protein recognized during infection and has been suggested to be an immunoprotective antigen (4 -7). The flagellar filaments of Campylobacter spp. are complex, composed primarily of the FlaA flagellin but with trace amounts of a highly homologous flagellin, FlaB (7). Flagellins from numerous strains of C. jejuni and the related organism, Campylobacter coli, have been shown to be glycosylated (8), and the modifications have been shown to occur on Ն13 serine residues on flagellin from a strain of C. coli (9). In addition, the glycosyl modifications are surface-exposed in the flagellar filament and appear to be highly immunogenic (10). Several genes involved in glycosylation of Campylobacter flagellin have been described (11)(12)(13). Two of these genes encode homologs of prokaryotic enzymes, sialic acid (Neu5Ac) synthase and CMP-Neu5Ac synthetase, which are involved in synthesis of Neu5Ac (11,12). These observations coupled with reports that a Neu5Ac-specific lectin can bind to Campylobacter flagellins, have led to the hypothesis that the glycosyl posttranslational modifications on flagellin include Neu5Ac moieties. Although protein glycosylation was previously considered to be restricted to eukaryotes and archaea, there are increasing examples of prokaryotic glycoproteins (14 -16) including pilins from Neisseria spp. (17) and Pseudomonas aeruginosa (18) and flagellins from not only Campylobacter spp., but also Caulobacter crescentus (19) and P. aeruginosa (20). We have undertaken a comprehensive structural analysis of C. jejuni and C. coli flagellin by mass spectrometry and NMR spectrometry to identify the precise nature of these modifications and the potential structural determinants underlying the selective glycosylation of Campylobacter flagellin. Furthermore, we have determined that a gene encoding a protein of unknown function in the C. jejuni genome sequence is involved in biosynthesis of one of the modifications.
Purification of Flagellin and Preparation of Tryptic Peptides-Campylobacter strains were grown in Mueller Hinton broth overnight at 37°C under microaerobic conditions in batches of 1.6 liters. Flagellin was purified by the method of Power et al. (10). Approximately 2 mg of flagellin was digested overnight at 37°C with trypsin (Promega, Madison WI) and subsequently evaporated using a Speedvac preconcentrator. Preparative HPLC 1 separations were conducted on a 25 ϫ 1-cm C 18 Vydac 218TP510 column (Hisperia, CA) using a gradient of 5-90% aqueous acetonitrile (0.1% trifluoroacetic acid) for 30 min. The UV detector was set to 214 nm. Replicate injections of 200 g each were made on the column, and fractions were collected every 1 min over the course of the gradient elution. For ␤-elimination experiments, purified tryptic glycopeptides (previously identified by mass spectrometry) were incubated for periods of 6 -16 h in a 25% aqueous solution of ammonium hydroxide. Samples were then evaporated to dryness on a Speedvac preconcentrator and reconstituted in water, and remaining salts were removed by passing the solution on a Millipore Ziptip C 18 (Bedford, MA) and eluting the peptide using 50% aqueous methanol solution (0.2% formic acid).
Mass Spectrometry-All mass spectra were obtained on a PerkinElmer/Sciex Q-Star mass spectrometer (Concord, ON, Canada) using liquid chromatography Tune and Biomultiview programs for data acquisition and processing, respectively. An HP 1100 liquid chromatograph was coupled to the Q-Star for cLC-MS experiments. Capillary LC separations were conducted with a 15-cm ϫ 0.32-mm Pepmap C 18 column (LC Packings, San Francisco, CA) using a linear gradient of 5-95% aqueous acetonitrile (0.2% formic acid) for 30 min. A pre-column flow splitter was mounted before a Rheodyne 8125 injector to provide a column flow rate of 3.5 l/min. Injections of typically 1 g of flagellin tryptic digests were made on the capillary column.
For conventional mass spectra of intact flagellin, ϳ1 g of purified protein in 0.2% formic acid was flow-injected into a stream of 50% aqueous acetonitrile (0.2% formic acid). The mass spectrometer was set to record the range m/z 1000 -2500. During the cLC-ESMS experiments, a stepped orifice voltage ramp was set over two distinct scanning periods (period 1: 1 s, stepped orifice (OR) 100 V m/z 150 -400; period 2: 2 s, OR 30 V, m/z 400 -1800) for the acquisition of conventional mass spectra. Tandem mass spectrometry experiments were conducted on the Q-Star using a nanoelectrospray interface. HPLC fractions of tryptic glycopeptides before and after ␤-elimination were loaded in the open end of a nanoelectrospray type A emitter (Micromass, Manchester, UK). Collision-induced dissociation (CID) of selected precursor ions identified in a preliminary survey scan was achieved using nitrogen as the collision gas at collision energies of typically 50 -90 eV (laboratory frame of reference). Fragment ions formed in the rf-only quadrupole were recorded by the time of flight mass analyzer.
NMR Spectroscopy-NMR experiments were performed on a Varian INOVA 600 NMR spectrometer with VNMR 6.1B software using a gradient inverse broadband nano-NMR probe at a spin rate of 2600 Hz (23,24). HPLC-purified samples (200 -500 g) were evaporated on a Speedvac concentrator and redissolved in 40 l of D 2 O with no adjustment of pH. NMR experiments were done at 25°C with the HOD resonance set at 4.77 ppm. All experiments were performed as described before (25). For the analysis of the NOE data, coordinates for 5-acetamido-7-acetamido-3,5,7,9-tetradeoxy-L-glycero-␣-L-mannononulosonic acid were generated using MM3(92) (QCPE) and InsightII (Molecular Simulations Inc.). The dihedral angles about the C6-C7-C8-C9 bonds were based on the major conformers found in solution (26,27).
Cloning and Genetic Analyses of Flagellin Modification Genes-The 81-176 homolog of Cj1317 and adjacent DNA was cloned from a -ZAP Express library using as probe a polymerase chain reaction product generated from the Cj1317 sequence (22). DNA sequencing was done with dye terminator chemistry on an Applied Biosystems Model 373A sequencer. Mutants were constructed using an in vitro Tn5-based transposition system EZ::Tn pMOD (Epicentre, Madison, WI) containing a Campylobacter chloramphenicol (Cm r ) resistance gene from pRY109 as previously described (28). The transposon was polymerase chain reaction-amplified with primers specified by Epicentre and used in an in vitro transposition reaction with the target plasmid. The reaction was transformed into Escherichia coli DH5␣, and plasmid DNAs from individual transformants were sequenced using primers within the Cm r cassette to determine the insertion point and orientation with respect to the target gene. Insertions into Cj1314c, Cj1315c, Cj1316c, and Cj1317 were electroporated into 81-176 (29) with selection on Mueller Hinton agar supplemented with 15 g/ml chloramphenicol under microaerobic conditions. DNAs from individual Cm r colonies were subjected to polymerase chain reaction using primers bracketing the insertion site of each insertion to confirm that the mutated allele had integrated into the 81-176 chromosome by a double crossover.
To perform complementation in trans with the Cj1316c mutant, DNA encoding the wild type alleles from plasmid pSG1854 was subcloned from the excision plasmid of -ZAP into the kanamycin resistant (Km r ) shuttle plasmid, pRY107 (30), to generate pRY107/1854. The construction was transformed into E. coli DH5␣ containing plasmid RK212.2 (31). Plasmid pRY107/1854 was conjugally mobilized by RK212.2 from E. coli into the Cj1316c mutant of 81-176 with selection on Mueller Hinton agar supplemented with 10 g/ml trimethoprim, 15 g/ml chloramphenicol , and 25 g/ml kanamycin. As a control pRY107 was also transferred into the Cj1316c mutant.
Motility Testing-Motility of mutants was compared with that of wild type on semi-solid (0.4%) Mueller Hinton agar plates as previously described (4).
Peptide Sequencing-Automated gas-phase amino acid sequencing was performed on an Applied Biosystems (Foster City, CA) model 491 Procise protein sequencer incorporating a model 140C microgradient system and a 785A programmable absorbance detector.
Characterization of Lipooligosaccharide Cores-Whole cells of C. jejuni strains were subjected to proteinase K digestion, electrophoresed on 16% Tricine gels (Invitrogen), and stained with silver (Bio-Rad) as described previously (28).

Analysis of Intact Campylobacter Flagellins-Electrospray
mass spectrometry experiments on purified flagellin from C. jejuni 81-176 ( Fig. 1) indicated that the molecular mass of the monomeric glycoprotein was ϳ10% higher than that predicted from the sequence of flaA, 59,240 Da (GenBank TM accession number AF345999). The reconstructed molecular mass profile obtained from the multiply protonated ions observed in the electrospray mass spectrum (Fig. 1, inset) showed two components at 65,766 and 65,841 Da with a broad peak profile of The reconstructed molecular mass profile, shown as an inset, indicates two major peaks at 65,766 and 65,841Da. The observed mass is ϳ6.5 kDa higher than the predicted protein (59,240Da). 600 -700 Da, possibly reflecting the heterogeneity in the glycoform distribution. It is noteworthy that no peak was observed for the predicted, unmodified FlaA flagellin (59,240 Da), suggesting extensive modification on the protein backbone structure. Mass spectral analyses on flagellin purified from strains NCTC 11168 and OH 4384 also gave broad heterogeneous molecular mass envelopes ranging from 65,600 to 66,400 Da with a few discrete glycoform peaks. Similarly, flagellin obtained from C. coli VC167 T2 exhibited a broad molecular mass distribution extending from 64,500 to 65,400 Da, also showing a 6,000-Da mass excess from the predicted sequence (11). In all the intact C. jejuni and C. coli flagellins examined, the extent of post-translational modifications was substantial, and this resulted in a quantitative incorporation of glycosyl moieties imparting approximately a 10% molecular mass excess on the gene product (see below).
cLC-ESMS of Tryptic Peptides from C. jejuni 81-176 Flagellin-To precisely assign the type and location of these posttranslational modifications, mass spectral experiments were performed using cLC-ESMS on tryptic peptides derived from purified flagellin. An OR voltage ramp was set over two distinct scanning periods (period 1: 1 s, OR 100 V m/z 150 -400; period 2: OR 30 V, m/z 400 -1800) to identify characteristic oxonium ions (period 1) together with precursor ions from which these specific carbohydrate fragment ions were derived (period 2). Under these conditions, Neu5Ac-containing glycopeptides typically yield an abundant oxonium ion at m/z 292. Although this residue was originally suspected as an O-linked glycan, the cLC-ESMS with high/low OR stepping could not unambiguously confirm its presence. Rather, the combined cLC-ESMS analyses revealed the elution of tryptic peptides with unusual modifications as reflected by distinct fragment ions at m/z 317 (dotted line) and 409 (dashed line) obtained from the extracted ion chromatograms shown in Fig. 2a.
The occurrence of these fragment ions (period 1) was used as a diagnostic tool to identify which eluting tryptic glycopeptides observed in the total ion chromatogram (period 2, solid line in Fig. 2a) comprised these unusual modifications. These analyses enabled the identification of at least 7 tryptic glycopeptides, some of which carried multiple modified residues. The number and types of substitution were calculated by comparing the mass of the modified tryptic peptides with those of the expected proteolytic fragments. For example, Fig. 2b shows the extracted mass spectrum of the peak eluting at 22.2 min in Fig.  2a. A number of co-eluting components were observed in Fig. 2b including a triply and doubly charged ions at m/z 838.1 and 1256.6 corresponding to the unmodified tryptic peptide T 135-157 . However, ions observed at m/z 1055.7 and 1407.6 (circled) could not be matched to any expected tryptic peptides and were associated with a peptide molecule of 4218.9 Da. The inset of Fig. 2b shows the reconstructed molecular mass profile obtained from these ions. The observed mass for this component was in good agreement with that calculated for the modified tryptic peptide T 390 -412 comprising 3 neutral 316 residues and 2 neutral 408 residues (4219.0 Da). It is noteworthy that the isotopic profile also suggests the occurrence of a related molecular species of 1 Da lower (4218.0 Da), which was later assigned as a substituted analog where one of the 316 Da residues was replaced by a 315-Da monosaccharide (see below). Similarly, the extracted mass spectrum taken at 22.9 min ( It is noteworthy that this peptide also displayed heterogeneity in the incorporation of a modified residue for which an oxonium ion was observed at m/z 316 (corresponding to a neutral 315-Da residue, data not shown) instead of m/z 317 (corresponding to a 316-Da residue). The nature of the 316-, 315-, and 408-Da substituents was later assigned as pseudaminic acid (Pse5Ac7Ac), its 5-acetamidino analog (Pse5Am7Ac), and 5,7-N-(2,3-dihydroxyproprionyl)-Pse (Pse5Pr7Pr), respectively (see below).

MS-MS Analyses of Tryptic Glycopeptides and ␤-Elimination
Products-HPLC fractions comprising the suspected tryptic glycopeptides were subjected to tandem mass spectrometry analyses to identify structural features that could be assigned to key functional groups of the unusual carbohydrate residues. As an example, the MS-MS spectrum of the [M ϩ 3H] 3ϩ ions previously observed at m/z 884.2 in Fig. 2c for tryptic peptide T 200 -222 yielded an intense oxonium ion at m/z 317 together with consecutive peptide bond cleavages resulting in b-and y-type ions (32) (Fig. 3a). These ions correspond to fragment ions where the charge is retained on the N and C terminus of the peptide backbone. The latter information was consistent with that predicted for the tryptic peptide T 200 -222 and confirmed the previous assignment. To determine the potential empirical formula of the oxonium fragment ion at m/z 317, accurate mass measurements were obtained using the predicted m/z values of y 2 and b 4 fragment ions. Upon recalibra- tion the mass measurement accuracy across the entire range was within 5 ppm of that calculated for individual b-and y-type fragment ions. Accordingly, the mass of the neutral carbohydrate moiety was determined to be 316.122 Ϯ 0.004 Da. The precision of this measurement eliminated glycerol phosphatidylinositol as a possible substituent (neutral residue mass 316.0559 Da), although previous investigations have reported the occurrence of a related structure ␣-glycerophosphate as a post-translational modification on Ser residues of Neisseria meningitidis pili (33). Among the different empirical formulas satisfying the defined mass constraints only C 13 H 20 O 7 N 2 (M r 316.126) was retained as a plausible candidate. This was also substantiated by the second generation fragment ions of m/z 317 formed by collisional-induced dissociation of the tryptic glycopeptide ions in the orifice/ skimmer region of the mass spectrometer (Fig. 3b). The MS-MS spectrum of this unusual carbohydrate residue was characterized by consecutive losses of neutral groups such as water, ketene (CH 2 CO) and formic acid (HCOOH). These experiments indicated that the glycosyl moiety was a diamino sugar containing an acid group, two N-acetyl functionalities, and a modified C 7 side chain. Together with the NMR studies performed on the same HPLC fraction (see below) these data were consistent with a pseudaminic acid (Pse5Ac7Ac), an unusual carbohydrate residue previously found in the lipopolysaccharide (LPS) of P. aeruginosa (26).
In addition to Pse5Ac7Ac, other related structural modifications were also found in C. jejuni flagellin. For example, a second oxonium ion was observed at m/z 316 in a number of peptides from the wild type strain of C. jejuni 81-176, including that of glycopeptide T 390 -412 (Fig. 2b). Second generation fragment ions of m/z 316 of this tryptic glycopeptide were achieved using higher quadrupole resolution settings to avoid selection of fragment ion m/z 317. The corresponding MS-MS spectrum showed prominent neutral losses of NH 3 and CH 3 CH(NH)(NH 2 ), consistent with the substitution of one of the two acetamido groups of Pse5Ac7Ac by an acetamidino functionality (Fig. 3c, CH 3 C(ϭNH)NH). This substitution resulted in a glycosyl moiety 1 Da lower than that of Pse5Ac7Ac and was termed 5-acetamidino-7-acetamido-Pse (Pse5Am7Ac). The occurrence of side chain fragment ions corresponding to the C 1 -C 9 backbone that are common to both Pse5Ac7Ac and Pse5Am7Ac (m/z 134, 180, and 221) suggested that the C 5 and not the C 7 acetamido group was substituted. Indeed, if the acetamidino group was located on C 7 , the backbone fragment ion m/z 180 would have been shifted to m/z 179. Confirmation of this assignment using NMR spectroscopy is presently under way, and results from this investigation will be reported separately. This related structure was observed in tryptic glycopeptides T 200 -220 , T 390 -412 , and T 423-466 , although some heterogeneity in the incorporation level of this residue was noted in these peptides. An O-acetyl derivative of Pse5Ac7Ac, Pse5Ac7Ac8OAc, was also found in the tryptic glycopeptide T 390 -422 . However, the precise location of this residue could not be established unambiguously due to the presence of other related O-linked sugars on the same peptide. The glycosidic bonds of the tryptic glycopeptides are more labile than peptide bonds, and deglycosylated fragment ions have a structure identical to that of unglycosylated peptide ions, thus preventing the identification of the modification sites.
Another common modification encountered in C. jejuni 81-176 flagellin was noted previously in Fig. 2b and corre-sponded to a neutral residue of an accurate mass measurement of 408.139 Ϯ 0.003 Da. The mass difference between this residue and Pse5Ac7Ac is 92.027 Da, in good agreement with an incremental C 2 H 4 O 4 moiety. The second generation product ion from m/z 409 (data not shown) was consistent with a substituted Pse5Ac7Ac, whereby the 2 N-acetyl groups were replaced by 2 N-2,3-dihydroxypropionyl groups (M r 408.1368) (Pse5Pr7Pr).
Although the modified peptides were identified based on mass differences from the predicted sequence, the individual amino acids bearing the modifications were not as readily assigned. The labile nature of the glycosidic bond between the hydroxyl amino acid Ser/Thr and the carbohydrate residue made it difficult to observe fragment ions comprising the intact modification. To unambiguously assign the site of O-linked attachment, purified glycopeptide fractions were subjected to base-catalyzed hydrolysis in the presence of NH 4 OH whereby the ␤-elimination product incorporated a newly formed amino group of a distinct mass (34). For example, upon ␤-elimination,

TABLE I Assignment of glycosyl substituents observed on tryptic peptides from C. jejuni 81-176 flagellin
The asterisk indicates charged residue. Hydrophobic residues preceding glycosylation sites are boxed. a Corresponds to monoisotopic molecular mass. b The single underline indicates a modified residue, whereas a double underline refers to the position of a Pse5Pr7Pr substituent. c Assignment is based on a previous report of modified residues on C. coli VC167 tryptic peptide by Edman sequencing (9). d Also confirmed by Edman sequencing of 81-176 tryptic peptide. e The peptide sequence was confirmed by DNA sequencing of the polymerase chain reaction product corresponding to base pairs 1-1432 (GenBank™ accession number AF345999).
O-linked Ser and Thr residues yield modified amino acids of neutral mass of 86 and 100 Da, respectively. This is illustrated in Fig. 4 for the tryptic glycopeptide previously shown in Figs. 2c and 3a whereby a total of four potential O-linked sites are present in the proteolytic fragment T 200 -222 . In this particular case, only Ser 206 was modified with a Pse5Ac7Ac residue, as evidenced by a mass shift of 86 Da between y 16 and y 17 (Fig. 4). By using this technique 10 of the 19 modifications sites found on C. jejuni flagellin could be uniquely identified to their corresponding Ser or Thr residues as is seen in Table I. An additional nine sites have been tentatively assigned in peptide T 423-466 based on Edman sequencing of the corresponding peptide from C. coli VC167 flagellin (9). In some cases, Pse5Ac7Ac analogues were assigned to specific Ser residues due to their relative lability compared with other occupied sites. This was the case for tryptic glycopeptide T 390 -412 , where Pse5Pr7Pr residues were assigned to Ser 397 and Ser 404 , whereas Thr 393 , Ser 400 , and Ser 408 were modified with Pse5Ac7Ac residues. Such a situation was found to be the exception rather than the rule since our MS-MS analyses mainly indicated types of modifications found on individual peptides rather than assigning the exact substituent attached to each Ser or Thr residue. Table I summarizes these findings and indicates the location and modifications found along the flagellin protein backbone. By using a combination of LC-ESMS and MS-MS experiments, 19 sites of modifications were identified on C. jejuni 81-176 flagellin. The predominant modification was Pse5Ac7Ac, along with the related structures Pse5Am7Ac, Pse5Ac7Ac8OAc, Pse5Pr7Pr. The mass excess associated with these substitutions (⌬M r 6411 Da) was consistent with that measured on the intact flagellins (ϳ6.5 kDa) and with a high level of site occupancy (between 18 -20 sites, Fig. 1). Although some microheterogeneity was observed on the different O-linked Ser/Thr residues, these experiments indicate that all 19 identified sites are usually occupied in each flagellin monomer.
The observed NOEs (Fig. 5, C-D) were also found to be consistent with the interproton distances obtained from a molecular model of 5,7-diacetamido-3,5,7,9-tetradeoxy-L-glycero-␣-L-manno-nonulosonic acid as drawn in Fig. 5. Since NOEs are highly dependent on interproton distances (r Ϫ6 ), they are thus also dependent on the structure of the molecule and the correct assignments. A strong NOE (not shown) was observed between the H-3 ax and H-3 eq resonances, in accord with an interproton distance of 1.8 Å. In Fig. 5C, the (H-4, H-3  Characterization of Genes Involved in Flagellin Modification-Linton et al. (12) report that mutation of gene Cj1317, annotated as neuB3 encoding a sialic acid synthase in the genome sequence of C. jejuni NCTC 11168, resulted in a non-flagellated phenotype in several strains of C. jejuni. The corresponding region of the genome of C. jejuni 81-176 was cloned. Sequence analysis revealed orthologs of Cj1317, Cj1316c, Cj1315c, and Cj1314c in an order similar to that described in the genome sequence of C. jejuni strain NCTC 11168 (see Fig. 6A), and all four open reading frames encoded predicted proteins whose best match was to proteins from NCTC 11168 (see Table II). Insertional inactivation of the 81-176 neuB3 gene (Cj1317) with a chloramphenicol cassette (Cm r ) resulted in loss of motility, and no flagellin was detected in whole cell lysates by Western blotting with a rabbit polyclonal antiserum against 81-176 flagellin (data not shown), confirming the results of Linton et al. (12). However, mutation of the adjacent gene, Cj1316c, encoding a predicted protein of 43.7 kDa in C. jejuni NCTC 11168 (22), resulted in a motile phenotype. The predicted protein encoded by this gene also shares significant similarity to the product of Cj1324c as well as to two genes involved in LPS biosynthesis in Legionella pneumophila and P. aeruginosa O5 serogroup, involved in synthesis of legionaminic acid and mannuronic acid, respectively (see Table II). Flagellin from the Cj1316c mutant was compared with flagellin from wild type 81-176 on IEF gels. Flagellin from wild type 81-176 separated into multiple glycoforms ranging from approximate pI 4.2-5.2 (Fig. 6B, lane1), similar to other Campylobacter flagellins (11,28). In comparison, flagellin from the Cj1316c mutant also showed multiple glycoforms but at a pI range of ϳ3.5-4.4 (Fig. 6B, lane 2). This shift toward the more acidic  region of the IEF gel is consistent with the loss of a basic functionality on the Pse5Ac7Ac structure. The two genes downstream of Cj1316c, Cj1315c, and Cj1314c encode proteins annotated as homologs of HisH and HisF in C. jejuni NCTC 11168 (22), respectively (see Table II). Mutations in Cj1315c and Cj1314c resulted in a motile phenotype, and flagellins isolated from both mutants displayed IEF patterns identical to wild type 81-176 (data not shown). Thus, the insertion into Cj1316c did not appear to exert a polar effect on the downstream genes. Nonetheless, to confirm that the flagellin phenotype observed was due to mutation at this locus and not to phase variation (12,22) at a distant site, the Cj1316c mutant was complemented in trans with a fragment of DNA containing Cj1316c through Cj1314c in plasmid pRY107/1854 (see Fig. 6B). Flagellin isolated from the complemented mutant (Fig. 6B, lane 3) displayed an IEF pattern intermediate between wild type flagellin (lane 1) and that from the Cj1316c mutant (Fig. 6B, lane 2). This partial complementation is likely due to the presence of the genes on a multicopy plasmid. Flagellin from the Cj1316c mutant containing the pRY107 shuttle plasmid alone appeared identical to the Cj1316c mutant (Fig.  6B, lane 4). The mobility of the lipooligosaccharide cores of wild type 81-176 and those of all four mutants were compared on 16% Tricine gels, and there were no differences observed under conditions that have been shown to detect the loss of a single sugar residue from lipooligosaccharide cores (Ref. 28; data not shown). LC-MS hi/low orifice stepping with reconstructed ion chromatograms on m/z 316 and m/z 317 was done to compare flagellins from the 81-176 parent, the Cj1316 mutant, and the mutant complemented in trans with pRY107/1854. Flagellin from both 81-176 and the Cj1316c mutant carrying pRY107/ 1854 contained the m/z 316 fragment ion characteristic of Pse5Am7Ac (data not shown). In contrast, flagellin from the Cj1316c mutant no longer displayed this modification, as evidenced by the absence of the relevant signal in the m/z 316 channel for the expected tryptic peptides. Instead all sites previously occupied by Pse5Am7Ac in the wild type flagellin were replaced by Pse5Ac7Ac residues in the Cj1316c mutant. Based on these data, we propose that Cj1316c be named pseA (pseudaminic acid), indicating its role in the biosynthetic pathway of this related group of molecules. DISCUSSION The results presented here show that Campylobacter flagellin is one of the most extensively modified prokaryotic proteins identified to date. A total of 19 sites of O-linked modification have been characterized on flagellin from strain 81-176, representing 10% of the total mass of the protein. The extent of the modifications accounts for the discrepancy observed between predicted mass and the accurate mass measurement obtained in this study. A similar degree of modification was found on other Campylobacter flagellins, although the distribution and possibly the nature of substituents may vary among strains. Although pseudaminic acid (Pse5Ac7Ac) has been identified in the LPS of a number of bacteria (26,35,36,38) and related sugars have been reported in the LPS of L. pneumophila serogroup 1 (37) and capsules of Sinorhizobium melioti (39,40), only very recently was the related molecule 5-N-3 hydroxybutyryl-7-N-formylpseudaminic acid reported to be part of a trisaccharide modification on P. aeruginosa pilin (41). We have demonstrated that Pse5Ac7Ac and derivatives are responsible for the extensive glycosylation of Campylobacter flagellin.
Two lines of evidence suggest that Neu5Ac was a component of the posttranslational modifications of Campylobacter flagellin (7,11,13). Doig et al. (8) show that the sialic acid-specific lectin limax flavus agglutinin bound to Campylobacter flagellins, but this observation was likely due to cross-reaction of the lectin to the structurally similar Pse5Ac7Ac moieties. Addition-ally, genomic sequencing of C. jejuni NCTC 11168 revealed the presence of multiple alleles of genes encoding proteins predicted to be involved in Neu5Ac biosynthesis (22). One set of these genes is clearly involved in biosynthesis of the Neu5Ac found in the lipooligosaccharide core of C. jejuni (11,28), and genetic evidence has suggested that the other two sets of putative Neu5Ac genes are involved in flagellin modifications. Thus, mutation in a gene encoding a putative CMP-Neu5Ac synthetase, termed ptmB (11), and another in a putative Neu5Ac synthase, neuB3 or Cj1317, have been shown to affect flagellin (Refs. 11 and 13 and this work). The E. coli K1 Neu5Ac synthase, NeuB, condenses mannosamine and phosphoenolpyruvate to form Neu5Ac. The homology of the neuB2 (Cj1327c) and neuB3 gene products to E. coli K1 NeuB suggests that the enzyme encoded by either of these genes may be involved in the condensation of a C6 and a C3 sugar precursor to form Pse5Ac7Ac in a similar fashion (42).
The definition of the chemical structure of the glycosyl modifications on flagellin will facilitate elucidation of the enzymatic pathways. Thus, in this study we have identified a role of the Cj1316c gene product or PseA in the synthesis of Pse5Am7Ac. As seen in Table II, PseA shows homology to proteins involved in synthesis of two related structures, legionaminic acid and mannuronic acid in L. pneumophila and P. aeruginosa, respectively. Both of these structures contain molecules with an acetamidino functionality (43,44). Interestingly, both the L. pneumophila and P. aeruginosa LPS gene clusters also contain orthologs of hisH and hisF, to which Cj1315c and Cj1314c show homology (see Table II). Thus, we cannot exclude a role for these genes in flagellin glycosylation since our preliminary mutant screening required detection of charge changes in IEF gels. Additional experimentation will be required to determine if Cj1315c and Cj1316c are involved in flagellin glycosylation.
The basis of O-linked glycosylation is poorly defined. Only selected serines in Campylobacter flagellin were decorated with a glycosyl group, and Pse5Pr7Pr, Pse5Ac7Ac8OAc, or Pse5Am7Ac are constrained to certain residues, suggesting specificity in the glycosylation process. However, it is interesting that in the absence of Pse5Am7Ac in the pseA mutant, the corresponding residues were replaced with Pse5Ac7Ac. Most of the modified residues were observed in a narrow hydrophobic region of the central core domain located between residues 342 and 481, a region shown to be surface-exposed in the flagella filament (10). The corresponding region in flagellin from C. coli VC167 had been shown to contain 12 modified serines (9), all of which are conserved in 81-176 flagellin. Based on the glycosylated residues identified in both 81-176 and VC167 flagellins (9), the site of attachment does not appear related to a consensus peptide sequence. Rather, the site of glycosylation appears to be at least partially dependent on local hydrophobicity upstream of Ser/Thr residues (boxed residues in Table I). Hydroxylated residues lying downstream from this local hydrophobic environment are expected to project outward, thereby rendering them accessible to glycosyl transferases (see Table I). Ser and Thr adjacent to acidic or basic residues (see asterisks, Table I) in the central peptide region were not typically glycosylated (e.g. Ser 367 , Thr 354 , Thr 473 , Thr 476 ). O-Linked glycosylation in C. jejuni flagellin may be a partially selective process preferentially targeting residues surrounded by aliphatic/aromatic residues. This is consistent with the surface-exposed site of the single glycosyl moiety added to Neisseria gonorrhoeae pilin at Ser 63 (45) but distinct from processes proposed for other prokaryotic O-linked glycosylated proteins (15,16).
The function of glycosylation to Campylobacter flagellin remains to be determined. The modifications will increase the hydrophilicity of flagellin and are likely to influence the inter-actions of C. jejuni with eukaryotic cells. The structural similarity of Pse5Ac7Ac to Neu5Ac might also suggest a role in immune avoidance, although there is evidence that at least some of the modifications are immunogenic and antigenically variable within a strain (8 -11). Moreover, the non-motile phenotype of neuB3 mutants (Ref. 12 and this work) suggests that glycosylation may affect Campylobacter flagellin subunit interactions and/or assembly, just as it does for Halobacterium flagellin (46) and Neisseria pilin (47).
In this report we have used mass and NMR spectroscopy to characterize the structure and extent of glycosylation on Campylobacter flagellin. The identification of the genes involved in synthesis and addition of the glycosyl moieties to flagellin can now be addressed. The data obtained will form the basis for studies on the role of these glycosyl moieties in both flagellar structure and function as well as C. jejuni virulence. Moreover, the system provides an excellent model system to study the process of O-linked glycosylation in prokaryotes.