Molecular basis for the structural diversity in serogroup O2-antigen polysaccharides in Klebsiella pneumoniae

Klebsiella pneumoniae is a major health threat. Vaccination and passive immunization are considered as alternative therapeutic strategies for managing Klebsiella infections. Lipopolysaccharide O antigens are attractive candidates because of the relatively small range of known O-antigen polysaccharide structures, but immunotherapeutic applications require a complete understanding of the structures found in clinical settings. Currently, the precise number of Klebsiella O antigens is unknown because available serological tests have limited resolution, and their association with defined chemical structures is sometimes uncertain. Molecular serotyping methods can evaluate clinical prevalence of O serotypes but require a full understanding of the genetic determinants for each O-antigen structure. This is problematic with Klebsiella pneumoniae because genes outside the main rfb (O-antigen biosynthesis) locus can have profound effects on the final structure. Here, we report two new loci encoding enzymes that modify a conserved polysaccharide backbone comprising disaccharide repeat units [→3)-α-d-Galp-(1→3)-β-d-Galf-(1→] (O2a antigen). We identified in serotype O2aeh a three-component system that modifies completed O2a glycan in the periplasm by adding 1,2-linked α-Galp side-group residues. In serotype O2ac, a polysaccharide comprising disaccharide repeat units [→5)-β-d-Galf-(1→3)-β-d-GlcpNAc-(1→] (O2c antigen) is attached to the non-reducing termini of O2a-antigen chains. O2c-polysaccharide synthesis is dependent on a locus encoding three glycosyltransferase enzymes. The authentic O2aeh and O2c antigens were recapitulated in recombinant Escherichia coli hosts to establish the essential gene set for their synthesis. These findings now provide a complete understanding of the molecular genetic basis for the known variations in Klebsiella O-antigen carbohydrate structures based on the O2a backbone.

O2a/O9 (ϭ O2aeh), O3, O4, O5, O7, O11, and O12) were proposed (8). A subsequent proposal also included nine O groups but with a slightly different composition (O1, O2, O2ac, O3, O4, O5, O7, O8, and O12) (9). The repeat-unit structures of known Klebsiella OPSs are shown in Fig. 1 (10 -13). Surveys of clinical isolates in these two studies revealed that 82 and 77% (respectively) were accounted for by serotypes O1, O2a, O3, and O5 (8). However, as described below, the structural relationships between O1, O8, and the various O2 subtypes are complex, and the known structures in the O2 subgroups ( Fig. 1) were not distinguished in the revised serological systems or in surveys of clinical isolates. Highlighting this deficiency, neither of these published studies distinguished serotype O2afg, which was recently found to be predominant in isolates of the globally disseminated multidrug-resistant ST258 clone (14). Developing reagents for classical serological tests to accurately distinguish closely related O antigens can be challenging, and molecular serotyping methods are increasingly adopted. A recent survey of K. pneumoniae genome sequences discovered that the OPS-biosynthesis (rfb) genetic loci in 93% of the isolates could be assigned to six known serotypes (of which 83% were O1, O2, or O3) (15). Five novel rfb locus variants were identified, but it remains unclear whether these reflect new OPS structures. The rfb gene clusters provide a helpful starting point, but molecular serotyping is challenging when additional (unlinked) genes determine important structural elements and corresponding epitopes. This is the case with the prevalent galactose-based O1, O8, and O2 serotypes from Klebsiella, where the existing serology is particularly complex and includes several O2 subtypes whose precise epitopes have not been established.
These OPSs all contain a shared backbone structural motif, the O2a antigen, composed of alternating ␣-D-Galp and ␤-D-Galf residues (also referred to as D-galactan I) (Fig. 1), whose synthesis is directed by genes in the rfb locus located adjacent to the hisI (histidine biosynthesis) gene on the K. pneumoniae genome (Fig. 2). (16). Six genes, wzm-wbbO, are necessary and sufficient for the production of the O2a antigen when expressed in Escherichia coli hosts (17)(18)(19)(20). Hereafter, these six genes are collectively referred to as rfb 2a . The O-antigen assembly process occurs at the cytoplasmic face of the inner cell membrane using undecaprenol-phosphate (Und-P) as an acceptor. Synthesis is initiated by transfer of GlcNAc-1-P to this lipid by WecA (18), a conserved phosphoglycosyltransferase enzyme, to form Und-PP-linked intermediates. The WbbM, WbbN, and WbbO proteins contain glycosyltransferase (GT) motifs, and all are required for O2a-antigen biosynthesis in vivo (18 -20). The glf gene encodes UDP-galactopyranose mutase, which converts UDP-Galp to UDP-Galf (21). The ABC transporter

Klebsiella O2 antigens
(whose transmembrane and nucleotide-binding domains are encoded by wzm and wzt, respectively) exports the completed lipid-linked OPS to the periplasm (16), where it is ligated to lipid A core and translocated to the outer membrane (22). An additional open reading frame (orf7) at the 3Ј end of the O-antigen biosynthesis operon encodes a predicted glycosyltransferase (14), but it has no known function and is not required for expression of authentic O2a OPS in an E. coli host (16).
The O2a-antigen structure was first identified in a subset of LPS molecules in serotype O1 (10, 23) and later as the sole OPS in serotype O2a (12). The O1 antigen (also called D-galactan II) is composed only of D-Galp residues (10, 23) and forms a structurally distinct polymeric domain linked to the non-reducing end of chains of O2a polysaccharide (10,24). The serotype O2ac antigen is also coexpressed with the O2a glycan and is proposed to possess a similar tandem arrangement (12,24). Additional O2 variants are composed of an O2a-antigen backbone modified by 132-or 134-linked ␣-D-Galp side groups, forming serotypes O2aeh and O2afg, respectively (11). The O2afg structure is the predominant OPS in isolates of the ST258 clone (14), and it can be further modified by the addition of the O1 antigen in about 40% of tested O2afg isolates (25). Further glycosylation of the O2a antigen in serotypes O1 (26) and O2afg (14) is essential for resistance to serum-mediated killing, providing an obvious selective advantage to isolates with modifications. Several of these glycans are also O-acetylated. For example, the carbohydrate backbones of the OPSs of serotype O1 and O8 are identical (and serologically cross-reactive), but they differ in partial O-acetylation of the O2a-antigen component of O8 (27). Also, the serotypes originally designated O9 and O2ae have essentially the same glycan structure as O2aeh, differing only in the frequency of side-branch addition (28).
Acetylation status does not seem to significantly alter the recognition of the OPSs by antibodies raised against the nonacetylated carbohydrate backbones (8) and is probably not an important factor from an immunotherapy perspective.
Immunotherapeutic approaches to treat Klebsiella infections require that antibodies against O antigen are protective in vivo. A monoclonal antibody against an epitope in O1 offered some protection at high dosage in an experimental infection model (29). More recently, a humanized monoclonal recognizing the O2afg antigen in the ST258 clone was protective in a murine model of endotoxemia (30). However, immunotherapeutic strategies require a complete understanding of the full range of OPS structures and their clinical prevalence. The objective of this study was to complete our understanding of the molecular basis for diversity in the carbohydrate structures of O1/O2 OPSs from the recognized serotypes. To this end, we identified new genetic loci that determine the O2aeh and O2c antigens and correlated OPS genetic complements to structures in the reference strains that were used to determine the OPS structures in the known serotypes.

Biosynthesis of the K. pneumoniae O2afg antigen in the prototype strain, CWK55, is dependent on an rfb-linked gmlABC gene cluster
In serotype O2afg, the products of three additional genes (gmlABC) located between the O2a-antigen biosynthesis operon and hisI (Fig. 2) are involved in the biosynthesis of the O antigen containing an ␣-(134)-linked Galp side group (14,25). A plasmid containing the gmlABC genes (hereafter gmlABC 2afg ) from clinical isolate KP-27 converted a Klebsiella The O serotypes are indicated at the left along with the designations of the strains from which the DNA sequences were obtained. hisI defines the 3Ј end of the O2a-antigen biosynthesis cluster (rfb region). The wzm-wbbO genes are necessary and sufficient for biosynthesis of the O2a antigen. The function of orf7 is unknown, and it is not necessary for biosynthesis of OPS in E. coli K-12. In serotype O2afg, the gmlABC cluster is located adjacent to the rfb region, whereas in O2aeh and O2ae, this cluster (gmlABD) is located next to the genomic proA locus. orf8 is present at the 3Ј end of the rfb region in CWK53 and encodes a putative acetyltransferase. The O1 (wbbY) and O2c gene clusters (wbmVWX) are not linked to the rfb region, and they are flanked by transposase genes (indicated by tnp and the insertion element (IS) family to which they belong). wbbY can be present in strains expressing either the O2a or the O2afg (gmlABC) antigens, and an example of each is shown.

Klebsiella O2 antigens
strain expressing the O2a antigen to the O2afg serotype (14). To confirm that no additional Klebsiella genes were required for O2afg OPS biosynthesis and correlate these data with the isolate used to determine the structure of the O2afg polysaccharide (11), we reconstituted the system in E. coli K-12. The genomic region between wbbO and hisI, containing the putative gml 2afg cluster, was amplified from CWK55, the prototype strain for the O2afg antigen (11). Sequence analysis revealed the presence of the gmlABC 2afg cluster with the same organization as that described for the K. pneumoniae O2afg strain, NTUH-K2044 (GenBank TM accession number AP006725) (14), and a BLAST comparison of the predicted protein sequences from these two isolates showed 100, 99, and 98% identity for GmlA 2afg , B 2afg , and C 2afg , respectively. E. coli DH5␣ was cotransformed with pWQ393 (gmlABC 2afg ) and pWQ288 (rfb 2a ). The SDS-PAGE profile of LPS from DH5␣ [rfb 2a , gmlABC 2afg ] revealed a typical LPS ladder pattern reflecting the distribution of OPS-substituted lipid A core (Fig. 3A). However, the individ-ual bands in the ladder profile exhibited a band shift, relative to a control sample from DH5␣ [rfb 2a ], synthesizing only the O2a antigen (Fig. 3A), consistent with a structural modification of the O2a polysaccharide due to the activities of GmlABC 2afg . In immunoblots, LPS from DH5␣ [rfb 2a , gmlABC 2afg ] did not react to antiserum raised against a K. pneumoniae strain expressing the O2a antigen (Fig. 3B), but this LPS did react with O2afgspecific antiserum (Fig. 3C).
To confirm the antigenic conversion of the O2a antigen to O2afg, LPS was purified from DH5␣ [rfb 2a ] and from DH5␣ [rfb 2a , gmlABC 2afg ], and the chemical structures of the OPS fractions were determined by a combination of 1D and 2D 1 H and 13 C NMR spectroscopy (Fig. 4, Fig. S1, and Table 1). As expected, the OPS fraction from the control LPS of DH5␣ [rfb 2a ] gave a 13 C NMR spectrum identical to that reported for the O2a antigen (12, 23) (Fig. 4). A comparison of the 13 C NMR spectra from O2a and O2afg polysaccharides showed that the latter contained six additional signals, including a signal for an A and D, silver-stained SDS-PAGE of LPS in whole-cell lysates. The corresponding immunoblots were probed with antiserum specific for O2a (B and E), the O2afg OPS of CWK55 (C), or the O2aeh OPS of CWK53 (F). The rfb 2a genes were expressed from pWQ288. The gmlABC 2afg and gmlABD 2aeh gene clusters were contained on pWQ393 and pWQ394, respectively. Some cross-reactivity was observed with the anti-O2a serum and unsubstituted lipid A-core molecules in CWK53 and CWK55. The orphan band in the last lane of E reflects an undigested protein antigen, which sometimes occurs in such lysates. In F, the higher reactivity of the wildtype CWK53 LPS (compared with the recombinant) could reflect an epitope(s) contributed by O-acetyl groups present in CWK53 but absent from the recombinant.

Klebsiella O2 antigens
additional anomeric carbon at ␦ 101.6 ppm (Fig. 4, O2afg, G1). The assignment of 1 H and 13 C signals was performed based on 2D COSY, TOCSY, ROESY, 1 H, 13 C HSQC, and HMBC experiments ( Fig. S1 and Table 1). NMR data demonstrated that the O2afg polysaccharide was composed of an O2a backbone in which the Galp residue was substituted with an ␣-(134)-linked Galp side group. 1 H and 13 C NMR chemical shifts were in good agreement with those previously reported for the O2afg structure (11,14). Comparison of the HSQC spectra of the O2a and the O2afg polysaccharides revealed a series of small signals that could originate from unmodified repeat units. Based on integral intensities of Galf H-2 signals (␦ H 4.33 and 4.39 in modified and non-modified repeat units, respectively), more than 90% of the backbone Galp residues were substituted with ␣-D-Galp. These data demonstrated that the OPS expressed from DH5␣ [rfb 2a , gmlABC 2afg ] was identical to the O2afg antigen originally reported for CWK55 (11) and confirmed the involvement of gmlABC 2afg reported previously (14). Furthermore, the recapitulation of this structure in E. coli defined the required minimal gene complement for authentic O2afg biosynthesis. This provided an essential foundation for subsequent work.
Some Klebsiella isolates with the O2afg genes also contain the unlinked wbbY gene responsible for producing the O1 antigen (31), indicating that the O1 polysaccharide ( Fig. 1) can be added to the non-reducing end of either modified or unmodified O2a antigen (25). The O1 antigen is not produced by CWK55 (12). The wbbY gene is present near the xynB locus in the O2afg reference genome of K. pneumoniae NTUH-K2044 (14, 25) but was absent from this region of the CWK55 genome (data not shown). Furthermore, a BLASTP search failed to identify a predicted WbbY homolog from the total CWK55 genomic data.

A gml locus unlinked to the rfb region is required for biosynthesis of the O2aeh O antigen
The O antigen of K. pneumoniae O2aeh differs from the O2afg polymer in the linkage ((132) versus (134)) of its ␣-D-Galp side group (11) (Fig. 1). Although not discussed in the original report (14), analysis of the gmlABC 2afg gene products indicates that they are derived from a three-component cassette, resembling other systems that direct the periplasmic modification of glycan backbones in a variety of contexts, and in combination with the gmlABC 2afg genes from K. pneumoniae CWK55 (O2afg) and gmlABD 2aeh from CWK53 (O2aeh). Signals were assigned based on 1 H NMR and 2D COSY, TOCSY, ROESY, 1 H-13 C HSQC, and HMBC experiments (for HSQC and HMBC spectra, see Figs. S1 and S3). The alphanumeric designations above the NMR peaks refer to carbons in the sugar residues labeled on the repeat-unit structure at the right of each spectrum. The 13 C NMR glycosylation effects on G C-1 (ϩ8.1 ppm in O2afg and ϩ3.2 ppm in O2aeh) are in good agreement with the reported values of ϩ8.0 ppm and ϩ4.0 ppm for the ␣-(134) and ␣-(132) linkages, respectively (71). Downfield displacement of the signals for P C-4 in the O2afg spectrum and P C-2 in the O2aeh spectrum (by 9.0 and 2.5 ppm, respectively, compared with their position in 33)-substituted P) confirmed the positions of side-chain galactose. A small portion of the repeating units were unmodified in both the O2afg (ϳ6%) and the O2aeh (ϳ15%) polysaccharides. The rfb 2a genes were expressed from pWQ288. The gmlABC 2afg and gmlABD 2aeh gene clusters were contained on pWQ393 and pWQ394, respectively.

Klebsiella O2 antigens
including the addition of glucosyl side groups to OPSs of Salmonella and Shigella (reviewed in Ref. 32). In these well-documented glucosylation systems, products of the gtrABC genes modify nascent OPS in the periplasm. GtrB proteins (GmlB homolog) are related to eukaryotic dolichol-phosphate mannose synthase enzymes participating in protein glycosylation (33) and synthesize a Und-P-glucose intermediate from the UDP-glucose donor at the cytoplasmic face of the inner membrane (34). The use of a monophosphoryl intermediate presumably prevents crosstalk between the side-group modification reactions and OPS polymerization, which utilizes Und-PP. Indirect evidence implicates GtrA (GmlA homolog) as the component required to translocate the lipid-linked sugar to the periplasmic face. Predicted structures of GtrA proteins (and the homologous GmlA) are similar to the multi-antimicrobial extrusion (MATE) family of proteins, which includes the ArnEF heterodimer, a Und-P-Ara4N flippase involved in modification of LPS lipid A (35). GtrC (GmlC homolog) is a GT-C family glucosyltransferase and adds the side-group glucose to the nascent OPS before ligation to the lipid A-core molecule (36).
It was predicted that biosynthesis of the O2aeh OPS occurred through a pathway analogous to O2afg, requiring the activities of the GmlABC 2afg homologs. Whole-genome shotgun sequencing was performed on K. pneumoniae CWK53, and the nucleotide sequence between orf7 and hisI revealed a single open reading frame ( Fig. 2; orf8). A query of the NCBI Conserved Domain Database (37) revealed that the orf8 gene product contained a domain with similarity to the acyltransferase-3 superfamily. Members of this superfamily include OafA and the Oac proteins involved in O-acetylation of OPSs in Salmonella and Shigella, respectively (38 -41). The CKW53 O antigen is O-acetylated non-stoichiometrically (11), and orf8 provides a candidate for the required enzyme. Using BLASTP, a database of predicted polypeptides derived from the CWK53 genomic sequence was queried with the K. pneumoniae CWK55 GmlB amino acid sequence, and a candidate gml locus was identified at a position between proA and a tRNA-Thr coding region on the CWK53 genome (Fig. 2). The predicted amino acid sequences for the CWK55 and CWK53 GmlA and GmlB proteins shared identities of 78% (over 115 amino acids) and 79% (over 308 amino acids), respectively (Fig. S2). The product of the third gene (which we name gmlD) in the O2aeh locus shows no primary structure homology with GmlC 2afg . This is expected from the Gtr glucosylation loci, where the initial two enzymes in the pathway provide conserved reactions and the GtrC proteins are serotype-specific glucosyltransferases with differing sequences (32). The carbohydrate structure of the K. pneumoniae O2ae OPS is identical to O2aeh (11). Analysis of a whole-genome shotgun sequence of the O2ae prototype, CWK52, identified the gmlABD 2ae genes at the same genomic locus as in CWK53. The respective GmlA and GmlB amino acid sequences were identical in the two strains, and the GmlD homologs shared 99% identity. The O1 antigen is not expressed in either CWK52 or CWK53 (11, 23), and the wbbY locus was absent in the genomes of these strains.
The amplified gmlABD 2aeh genes were cloned to generate pWQ394. Like the corresponding experiments with O2afg LPS, the ladder in the silver-stained LPS profile from DH5␣ [rfb 2a , gmlABD 2aeh ] exhibited a band shift, compared with DH5␣ [rfb 2a ] LPS (Fig. 3D). This LPS only reacted with antibodies raised against O2aeh LPS from the reference strain (Fig. 3, E and F). These data suggested that the gmlABD 2aeh gene cluster from CWK53 was indeed responsible for serotype conversion of the O2a OPS to the O2aeh serotype, and this was confirmed by structural analysis. 1 H and 13 C NMR spectroscopy of the OPS fraction purified from DH5␣ [rfb 2a , gmlABD 2aeh ] was used to confirm the presence of the ␣-(132)-linked Galp side group (Fig. 4). 2D COSY and TOCSY spectra of the recombinant O2aeh polysaccharide revealed spin systems for three sugar residues (designated F, P, and G), all having a galacto configuration. Correlations of the anomeric protons H-1 with H-2, H-3, and H-4 were traced for each spin system, and the remaining H-5 and H-6 were assigned based 1 H, 1 H ROESY and 1 H, 13 C HMBC data. The assignment of 13 C chemical shifts was performed using 1 H, 13 C HSQC and HMBC experiments ( Fig. S3 and Table 1). The position of F C-1 at ␦ C 110.4, F C-2-C-4 at the region ␦ C 82-86 ppm, and a strong H-1/C-4 correlation observed in the HMBC spectrum are characteristic of ␤-furanose residues (42). Based on 13 C chemical shifts, residues P and G are pyranoses. The ␣-anomeric configuration of P was established by the small 3 J 1,2 coupling constant of 3.6 Hz. The signal for G H-1 was unresolved with 1 ⁄ 2 Ͻ 5 Hz. The ␣-anomeric configuration of G was inferred from its C-5 chemical shift at ␦ 72.3 and confirmed by H-1/H-2 correlation in the ROESY spectrum. Interresidue correlations between anomeric carbons and protons at the linkage carbons G C-1/P

Klebsiella O2 antigens
H-2, P C-1/F H-3, and F C-1/P H-3 observed in the HMBC spectrum (Fig. S3) revealed the positions of substitution and the sequence of the residues in the repeat unit. Hence, the O2aeh OPS obtained from the recombinant strain has a backbone containing [33)-␤-D-Galf-(133)-␣-D-Galp- (13] disaccharide repeat units, whose Galp residues are substituted with ␣-Galp at position 2, identical to the carbohydrate structure of the authentic product (11).

Production of the K. pneumoniae O2c antigen is thermoregulated, and the biosynthesis genes are unlinked to the rfb 2a locus
The K. pneumoniae O2c antigen is a polymer of alternating ␤-D-Galf and ␤-D-GlcpNAc residues co-expressed on the cell surface with the O2a OPS ( Fig. 1) (12). NMR data predicted a tandem arrangement of the O2a and O2c antigens in a format resembling the O1 OPS. Using a PCR-based approach, Fang et al (43) demonstrated that the K. pneumoniae O1 and O2ac serotype strains could be distinguished by the differences in their respective wbbY alleles. In O2ac isolates, wbbY is truncated relative to the O1 allele, and their DNA sequences diverge toward the 3Ј end (43), but the functional implications of this variation are unclear. The O1 polysaccharide was not identified in the Klebsiella O2ac prototype strain 5053 (12), so the impact of "wbbY 2ac " from 5053 on OPS biosynthesis was assessed by introducing a plasmid (pWQ398) containing "wbbY 2ac " into E. coli DH5␣ [rfb 2a ]. The LPS from E. coli DH5␣ [rfb 2a , wbbY 2ac ] was not recognized by either an O1-specific mAb or O2c-specific antiserum (data not shown), and it was concluded that the truncated O2c "wbbY" allele does not encode a functional protein.
To identify the genes required for biosynthesis of the O2c antigen, whole-genome shotgun sequence from K. pneumoniae 5053 was examined for candidate glycosyltransferases that were not ascribed to known glycan biosynthesis pathways. A cluster of three open reading frames (designated wbmV, wbmW, and wbmX), each encoding a putative glycosyltransferase, was identified as part of a 12,706-bp contig (Fig. 2). The gene cluster was cloned from 5053 DNA to produce the recombinant plasmid pWQ395. DH5␣ [rfb 2a , wbmVWX] produced LPS composed of both the O2a antigen and an additional OPS recognized by antiserum specific for the O2c antigen (Fig. 5). All three genes in the cluster are essential for the production of the 2c antigen (Fig. 6). Interestingly, expression of the O2c antigen in both the wildtype Klebsiella 5053 and the E. coli-based recombinant was low at 37°C but enhanced significantly by growth at 30°C (Fig.  5), indicating that synthesis of this glycan is temperature-regulated. In the wildtype Klebsiella 5053, the increase in 2c antigen was accompanied by a marked reduction in the amount of O2a antigen. These LPS phenotypes were more striking in K. pneumoniae 5053, presumably because the elevated gene copy in the recombinant strain masked some of the regulatory effects. Temperature regulation of the O2a or O1 antigens was not observed in K. pneumoniae O1 strains (data not shown).
The chemical structure of the OPS produced by DH5␣ [rfb 2a , wbmVWX] was confirmed by NMR spectroscopy. The complete assignment of 1 H and 13 C resonances (Table 1) was performed by a combination of 1D and 2D experiments, including 2D COSY, TOCSY, ROESY, 1 H, 13 C HSQC, and HMBC (Figs. S4 and S5). The 1 H and 13 C chemical shifts are in good agreement with those reported previously for the O2ac polysaccharide (12). Based on the integral intensities in the 1 H NMR spectrum, the ratio between the O2a and O2c polymers is ϳ1:1.4. In the original report establishing the chemical structure of the O2ac OPS from K. pneumoniae 5053, periodate oxidation studies suggested that the O2a and O2c antigens represented two distinct glycans (each attached to independent LPS molecules) on the bacterial cell surface (12). However, subsequent NMR data were consistent with a direct linkage between a terminal ␤-D-Galf residue in the O2a polysaccharide and a GlcpNAc residue from the O2c antigen (24) in a format resembling the O1 OPS (Fig. 1). Attempts to determine the type of linkage (if present) between them by interpreting the minor signals in the NMR spectra here were unsuccessful, due to very low intensity and significant overlap with the signals of the internal repeat units. An in vivo approach was therefore used to unequivocally establish the structural format, by examining whether assembly Figure 5. A gene cluster unlinked to the rfb region is involved in thermoregulated biosynthesis of the O2c antigen. A, silver-stained SDS-PAGE of LPS from whole-cell lysates of the wild-type K. pneumoniae 5053 and E. coli DH5␣ harboring recombinant plasmids containing genes required for the biosynthesis of the O2a and the O2c OPS. B and C, corresponding immunoblots probed with antisera specific for the O2c and O2a OPS, respectively. Cultures were grown at 30 or 37°C. The rfb 2a genes were expressed from pWQ288. The wbmVWX genes were provided by pWQ395.

Klebsiella O2 antigens
of the O2c antigen was dependent on concurrent biosynthesis of the O2a polymer. E. coli K-12 strain CWG286 was transformed with plasmid pWQ395 (wbmVWX) as well as one of a series of plasmids containing rfb 2a loci with single mutations in each gene. CWG286 has a deletion spanning the rfb K-12 region (21), ruling out any contribution to OPS biosynthesis from the E. coli K-12 pathway. Biosynthesis of the O2c OPS was clearly dependent on all genes in the rfb 2a cluster (Fig. 7). A dependence on glf to provide the UDP-Galf sugar donor was expected, but dependence on the O2a glycosyltransferases (wbbMNO) and transporter (wzm, wzt) implicated nascent O2a antigen as an acceptor for polymerization, supporting the conclusion that the O2c antigen is indeed added to the non-reducing terminus of O2a OPS chains.

Discussion
Published serological re-evaluations of the Klebsiella O-serotype reference strains led collectively to a consensus of 11 O serotypes (O1, O2a, O2ac, O2aeh(O9), O3, O4, O5, O7, O8, O11, and O12) (8,9). The carbohydrate structures of O1 and O8 are identical (27), and the structure of O11 remains unknown. However, considering the chemical structures of the known OPS molecules, as well as the biosynthesis genes unique to the serotypes and the distribution of these structures in clinical isolates, the number of distinct O-antigen types can minimally be expanded to include O2afg, which is linked to the ST258 clone (14). Here we described the molecular basis of additional O2 subgroups, focusing only on the carbohydrate components A, to determine whether all three genes in the wbmVWX are necessary for the expression of the O2c antigen, a series of constructs were made that each eliminate one of the genes. B, silver-stained SDS-PAGE of LPS from whole-cell lysates of E. coli DH5␣ harboring recombinant plasmids containing genes required for the biosynthesis of the O2a OPS and derivatives of the O2c wbm gene cluster. C and D, corresponding immunoblots probed with antisera specific for the O2c and O2a OPS, respectively. Cultures were grown at 30°C. The rfb 2a genes were expressed from pWQ288. Plasmids pWQ395, pWQ895, pWQ896, and pWQ897 expressed the gene combinations wbmVWX, wbmVX, wbmVW, and wbmWX, respectively. and immunoblotting with antisera specific for O2a (B) and the O2c antigen (C). Und-PP-linked OPS accumulates intracellularly in ABC-transporter (wzm wzt) mutants. This material is poorly stained with silver but detectable by immunoblotting (70). Individual rfb 2a mutations were provided by plasmids pWQ517 (⌬wbbM), pWQ549 (wbbN*; frameshift mutation), pWQ516 (⌬wbbO), pWQ289 (⌬wzm ⌬wzt), and pWQ633 (⌬glf).

Klebsiella O2 antigens
of the structures. Partial O-acetylation can generate additional epitopes that contribute additional O2 subfactors (11, 28) and provide the only difference between O1 and O8 (27). However, these modifications do not prevent recognition by antibodies generated against the non-acetylated versions of these polysaccharides (8). Without exhaustive cross-absorption of polyconal sera or more precise monoclonal antibodies, the acetylated/ non-acetylated forms will not be distinguished in conventional serological tests.
Serological typing methods do not easily distinguish all of the galactose-containing O antigens. A PCR-based genotyping method was developed to differentiate the major Klebsiella O groups based on variation among the wzm-wzt alleles within the rfb region (43), but the shared rfb region present in all O1 and "O2" strains precluded subclassification within this serogroup. This method did discriminate between the O1 and O8 serotypes, which seems surprising, given the conserved function of these transporters. However, this is explained by the phylogenetic separation of O8 isolates from O1 and "O2" and consistent with rfb DNA hybridization analyses (27). Fang et al. (43) distinguished serotype O1 and O2ac strains by differences in the wbbY alleles, which are truncated in O2ac isolates and shown here to generate a non-functional enzyme. With data reported here, O-genotyping could now be refined with a positive PCR by utilizing oligonucleotides specific for the O2c biosynthesis gene(s). As a diagnostic tool, genetic classification of Klebsiella O types should be considered complementary to serological typing, as strains testing positive for a particular OPS biosynthesis locus may harbor mutations preventing expression of the antigen on the cell surface (9,43).
The gmlABC 2afg and gmlABD 2aeh loci are responsible for modifying the O2a-antigen backbone with ␣-(134) and ␣-(132)-linked galactose residues, respectively. The sequence differences in the GmlC 2afg and GmlD 2aeh orthologues make these genes useful markers in a genotyping scheme for distinguishing these O-serotypes. The O2aeh OPS is structurally similar to that of O2ae (which is identical to O9). These polymers are acetylated, and the only detectable difference is the frequency of both acetylation and modification by the ␣-(13-2)-D-Galp side group (11). The physical separation of polymerization (cytoplasm) and side-chain addition (periplasm) creates an opportunity to have populations of LPS species with varying amounts of modification or potentially subdomains with an individual chain that varies in the extent of modification. The gmlABD sequences of O2aeh and O2ae are nearly identical, so the variable stoichiometry of Galp substitution is most likely due to differences in transcription/translation of one or more biosynthetic genes or some (unknown) instability that leads to loss of the locus in a portion of the population. Serotype O2afg has been associated with the drug-resistant strain ST258 (14) and shown to be frequently distributed in a globally representative collection of K. pneumoniae isolates (15). In this collection of 573 whole-genome sequenced isolates, 387 isolates (67%) contain the rfb 2a locus. We re-examined this collection using the new genetic insight. Of these, 164 isolates (42%) were identified as containing the gmlABC 2afg cluster, showing the seroepidemiological importance of the O2afg antigen (Fig. S6). Surprisingly, no gmlABD 2aeh cluster was detected in any of these isolates, leaving the natural distribution of serotype O2aeh uncertain.
The majority of three-component glycan modification systems described so far in Gram-negative bacteria modify OPS produced by Wzy-dependent biosynthesis pathways, where Und-PP-linked oligosaccharide repeat units are translocated to the periplasm for polymerization (32). The Klebsiella O antigens are synthesized by the ABC-transporter pathway, in which the entire OPS is polymerized on Und-PP in the cytoplasm, before export to the periplasm. In a hybrid ABC transporterdependent pathway reconstituted in E. coli K-12, a host Gtr periplasmic glucosylation system modifies the K. pneumoniae O12 OPS (44), illustrating that the polymerization strategy poses no barrier for these types of modification. The Klebsiella O2afg and O2aeh strains are the only examples in which Gtrlike modifications have been unequivocally demonstrated for native ABC transporter-dependent OPS biosynthesis pathways. In E. coli, Shigella, and Salmonella, the gtrABC genes are often associated with lysogenic bacteriophages, and the associated OPS side-group modifications may function to prevent superinfection by bacteriophage that recognize the unmodified OPS as a binding receptor (44). Altering the OPS may also promote bacteriophage reproduction by affecting immune evasion and enhanced survival of the host bacteria (45,46). In one study, 20 of 22 Salmonella enterica subspecies enterica genomes contained between two and four gtr loci (45), each conferring a distinct OPS modification. These gtr loci are located on prophages or within bacteriophage-derived regions, giving rise to a mobile trait that can diversify by recombination. The gml-ABD loci in K. pneumoniae O2aeh and O2ea are located next to a tRNA-Thr gene, whereas the gmlABC 2afg locus is situated between the rfb 2a cluster and hisI. There is no evidence of homology to bacteriophage DNA within or near either of the gml gene clusters.
The K. pneumoniae O2ac genes wbmVWX are required for the biosynthesis of the O2c OPS composed of alternating D-Galf and D-GlcpNAc added to the end of the rfb 2a -encoded O2a antigen. The OPS structure (Fig. 1) predicts three required glycosyltransferases creating three linkage types, two for the disaccharide O2c-antigen repeat unit itself and one for the transition between the two glycan domains (i.e. between O2a and O2c). In K. pneumoniae 5053, this gene cluster was flanked by putative insertion elements, suggesting that this region was acquired by a transposition event. Horizontal acquisition of the locus is supported by the relatively low GϩC content (35%) compared with 56% GϩC for the entire genomic sequence (data not shown). In Klebsiella, the O2c antigen is thought to be extended on a Galf residue from the O2a polysaccharide (24). It is conceivable that the O2c antigen could also be assembled on glycans with different repeat unit structures, providing the appropriate terminal Galf is available to provide an acceptor. We re-examined the collection of 573 Klebsiella genomes and identified 14 isolates where the full wbmVWX locus was present and two others with a truncated cluster lacking wbmX (Fig. S6). Three of these 14 isolates also possess the gml 2afg locus, potentially giving rise to more serotype complexity. Furthermore, 10 of the 16 isolates are from the pre-antibiotic collection, consisting of isolates collected before 1947. Interestingly, those strains with the full Klebsiella O2 antigens wbmVWX locus lack wbbY, whereas those possessing only wbmV and wbmW possess an intact wbbY and would probably serotype as O1.
The O2c antigen provides the only example of thermoregulated OPS in K. pneumoniae. The details of the thermoregulation have not been investigated and are not central to the objectives of the current study. Unfortunately, we do not have other authenticated O2ac isolates to confirm that this property is conserved in all representatives of this serotype. Temperature regulation has been described for the biosynthesis of a few other OPSs in other bacteria. The Yersinia enterocolitica O:8 OPS is expressed at 22-25°C but transcriptionally down-regulated at 37°C (47). Its expression is critical for virulence of Y. enterocolitica (48 -50), but it has been proposed that after colonization, down-regulation of OPS is necessary for exposure or function of other virulence factors (47,51). Why such a process would be required in O2ac and not in other serotypes is unclear. It is conceivable that the O2c antigen plays no active role in pathogenesis, perhaps explaining its limited distribution.
With these studies, the molecular genetic basis for the known serological complexity in serotypes O1 and O2 is now resolved. Establishing the essential genetic complement for the biosynthesis of these OPS structures affords the opportunity to synthesize precise glycans for therapeutic applications in defined recombinant E. coli backgrounds. However, it has also provided the tools necessary to rule out isolates of the O2ac and O2aeh serotypes as major components in collections of clinical isolates.

Bacterial strains and growth conditions
The bacterial strains and plasmids used in this study are listed in Table 2. Cultures were grown either in lysogeny broth (LB) (52) or on LB agar, and the antibiotics, ampicillin (100 g/ml) or chloramphenicol (34 g/ml), were added when required. For growth of CWG286, overnight starter cultures contained 0.4% (w/v) D-glucose, and these were subcultured into LB containing 0.1% (w/v) D-galactose.

Construction of recombinant plasmids
KOD Hot Start DNA polymerase (Novagen) was used to amplify DNA fragments by PCR. Oligonucleotide primers (Sigma) along with their relevant characteristics are listed in Table S1. PCR products were purified from reactions using the PureLink PCR purification kit (Invitrogen). Plasmid and genomic DNA were purified with the PureLink plasmid miniprep kit and the PureLink genomic minikit, respectively (Invitrogen). Recombinant plasmids used in this study were constructed by cloning PCR fragments into the vector pBR322 by Gibson Assembly (New England Biolabs). Briefly, pBR322 was digested with the restriction endonucleases BamHI and SalI (New England Biolabs), and inserts were incorporated downstream of the tetracycline promoter by homologous recombination, mediated by primer sequences homologous to DNA flanking the restriction sites in pBR322. BamHI and SalI sites were retained in the recombinant plasmids. When cloning the convergently transcribed K. pneumoniae 5053 wbmVWX genes to produce pWQ395, potential promoter and/or regulatory elements were accommodated by cloning a DNA fragment that included non-coding sequences between the putative gene cluster and transposase genes flanking the locus (Fig. 2). Plasmid pWQ895 was constructed by replacing the wbmW gene with the kanamycin resistance gene from pKD4 by -red-mediated recombination (53,54). Plasmid constructs were assessed by restriction endonuclease digestion and confirmed by DNA sequencing performed by the Advanced Analysis Centre, Genomics Facility, University of Guelph.

Genomic DNA sequencing
Whole-genome shotgun sequence data (Illumina paired end reads; ϳ100ϫ coverage) for K. pneumoniae 5053 and CWK2

Bioinformatics analyses
BLAST analyses (59) of nucleotide and amino acid sequences were performed using the National Center for Biotechnology Information (NCBI) server (https://blast.ncbi.nlm.nih.gov/ Blast.cgi) and the standalone BLAST software. Protein motifs were searched using the Conserved Domain Database server (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) (37) and the Pfam protein motif database (http://xfam.org). 5 Insertion elements were identified using ISFinder (https://www-is. biotoul.fr) 5 (60). The WbbY amino acid sequence used for BLASTP queries was from K. pneumoniae NTUH-K2044 (GenBank TM accession number KJ451390). The origin of the globally representative collection of 573 K. pneumoniae isolates has been described in detail elsewhere (15). In short, four different collections have been analyzed: a global data set consisting of isolates from six different countries (61), a UK hospital data set collected over a period of 7 years, a Nepal hospital data set from a single outbreak (62), and a pre-antibiotic data set from strains isolated before the widespread use of antibiotics (63). The presence/absence analysis of the genetic elements rfb 2a , wbbYZ, gmlABC 2afg , gmlABD 2aeh , and wbmVWX has been performed as described previously (15).

Isolation of O-polysaccharides
Overnight cultures (10 liters , which was grown at 30°C. Cells were harvested by centrifugation at 5,000 ϫ g, washed with distilled water, and lyophilized. LPS was isolated by hot phenolwater extraction (67). Briefly, 4 -5 g of dry cells were extracted with 300 ml of 45% (v/v) aqueous phenol at 70°C with constant stirring. After cooling, the phenol and water phases were separated by centrifugation, and the phenol phase was re-extracted with an equal volume of preheated water. The pooled water phase was dialyzed against tap water to remove the phenol and then concentrated with a rotary evaporator. The crude LPS solution was adjusted to pH 2 with cold aqueous CCl 3 COOH. Precipitated proteins and nucleic acids were removed by centrifugation at 12,000 ϫ g, and the supernatant was dialyzed against distilled water and lyophilized. LPS isolated from E. coli DH5␣ [pWQ288, pWQ395] was subjected to an additional ultracentrifugation step at 105,000 ϫ g for 16 h at 4°C. Purified LPS samples (150 mg) were hydrolyzed with 2% (v/v) acetic acid at 100°C until precipitation was observed (2-4 h). The lipid precipitate was removed by centrifugation at 13,000 ϫ g, and the carbohydrate-containing supernatant was fractionated on a Sephadex G-50 superfine column (2.5 cm ϫ 75 cm) in 50 mM pyridinium acetate buffer (pH 4.5) at a flow rate of 0.6 ml min Ϫ1 . Elution was monitored with a Smartline 2300 refractive index detector (Knauer).

Nuclear magnetic resonance spectroscopy
NMR studies were performed at the University of Guelph Advanced Analysis Centre. Polysaccharide samples were deuterium-exchanged by lyophilizing twice from 99.9% D 2 O and then analyzed as solutions in 99.96% D 2 O. NMR spectra were recorded at 50°C (O2a, O2afg, and O2aeh polysaccharides) and 30°C (O2ac polysaccharide) on a Bruker AvanceII 600-MHz spectrometer equipped with a cryoprobe. The Bruker TopSpin 3.2 program was used to acquire and process the NMR data. Mixing times of 100 and 200 ms were used in TOCSY and ROESY experiments, respectively. The HMBC experiment was optimized for the J H,C coupling constant of 8 Hz. The chemical shifts are referenced to 3-trimethylsilylpropanoate-2,2,3,3-d 4 (␦ H 0, ␦ C -1.6) added as an internal standard.