The Assembly System for the Outer Core Portion of R1- and R4-type Lipopolysaccharides of Escherichia coli

The major core oligosaccharide biosynthesis operons from prototype Escherichia coli strains displaying R1 and R4 lipopolysaccharide core types were polymerase chain reaction-amplified and analyzed. Comparison of deduced products of the open reading frames between the two regions indicate that all but two share total similarities of 94% or greater. Core oligosaccharide structures resulting from nonpolar insertion mutations in each gene of the core OS biosynthesis operon in the R1 strain allowed assignment of all of the glycosyltransferase enzymes required for outer core assembly. The difference between the R1 and R4 core oligosaccharides results from the specificity of the WaaV protein (a β1,3-glucosyltransferase) in R1 and WaaX (a β1,4-galactosyltransferase) in R4. Complementation of thewaaV mutant of the R1 prototype strain with thewaaX gene of the R4 strain converted the core oligosaccharide from an R1- to an R4-type lipopolysaccharide core molecule. Aside from generating core oligosaccharide specificity, the unique β-linked glucopyranosyl residue of the R1 core plays a crucial role in organization of the lipopolysaccharide. This residue provides a novel attachment site for lipid A-core-linked polysaccharides and distinguishes the R1-type LPS from existing models for enterobacterial lipopolysaccharides.

Lipopolysaccharide (LPS) 1 is an essential component of the Gram-negative bacterial cell surface. It forms the major component of the outer leaflet of the outer membrane and, as such, plays an integral role in the interaction of the bacterium with the surrounding environment. The basic structure of LPS can be divided into three regions. From the outer membrane of the cell outward, these are the following: (i) a hydrophobic lipid component (lipid A), which anchors the LPS molecule in the outer membrane; (ii) a core oligosaccharide (core OS) consisting of 10 -15 sugars; and (iii) a structurally diverse O-polysaccharide (O-PS), which provides an extended and hydrophilic cell surface layer that aids in resistance to complement-mediated serum killing. In the Enterobacteriaceae, the core OS is divided into two structural regions, an inner core region, which contains 3-deoxy-D-manno-oct-2-ulosonic acid and heptose, and an outer core region, which consists primarily of hexose and acetamido sugars. The inner core is highly conserved among Gramnegative enteric bacteria, while the outer core region shows diversity with respect to the type of sugars present and the linkages by which they are joined. There are five different core OS structures described in Escherichia coli (designated K-12, R1, R2, R3, and R4). All E. coli core OSs have three outer core hexose residues designated HexI, -II, and -III, where HexI is the first residue of the outer core OS. The structures of the outer core OSs of E. coli R1 and E. coli R4 are shown in Fig. 1A. Of the five recognized E. coli outer core OS structures, the R1 and R4 types are most closely related to one another, differing by only a single sugar residue in an equivalent position. This difference lies in the branch substitution at the HexII position of the outer core OS backbone. Specifically, the second Glc (HexII) of the outer core OS backbone is substituted by ␤-linked Glcp in R1 and ␤-linked Galp in R4. These two core OS types share a common terminal disaccharide moiety (␣-D-Galp-(132)-␣-D-Galp) which is not found in E. coli K-12, R2, or R3 core types.
Lipid A-core and O-PSs are formed by independent assembly pathways (1)(2)(3), and genes for their assembly are found in different locations on the chromosome. The core OS biosynthesis, or waa (formerly known as rfa), region is located near 81 min on the E. coli K-12 linkage map and contains genes that are required for assembly of the K-12 core OS. Many of the genes at this locus are thought to encode glycosyltransferases that sequentially elongate the core OS on a lipid A acceptor molecule. The major operon of the core OS biosynthesis regions of E. coli K-12 (4), E. coli R2 (5), and Salmonella enterica sv. Typhimurium (5, 6) have been described. Most of the genes in these operons have homologs where predicted products are highly conserved (greater than 70% total similarity) in all three organisms.
The R1 core OS structure is the most prevalent among clinical isolates of E. coli (7,8) and, along with the R4 core OS structure, is also found in Shigella spp. The assignment of unambiguous glycosyltransferase activity to particular genes in E. coli K-12 and S. enterica sv. Typhimurium has been hampered by the lack of precisely defined mutations and structural determination of mutant LPSs. This study addresses the assembly of the outer region of the R1 and R4 core types by analyzing the chemical structure of precisely defined insertion mutations within the core OS biosynthesis gene cluster. In this manner, we make definitive assignments of the functions of genes within the core OS biosynthesis operon. Given that there are unique side-branch substituents present within the outer core region of the R1 and R4 core OS types (i.e. sugars and linkages that are not present in any of the other characterized E. coli or S. enterica sv. Typhimurium core OS types), we identify two novel genes that influence the key determinants of these two LPS core types. In addition, we characterize a third gene, shared by the R1 and R4 core OS biosynthesis regions but not by other E. coli core OS biosynthesis regions, whose product defines the terminal ␣-D-Galp-(132)-␣-D-Galp disaccharide that is common to only the R1 and R4 core OS structures. Finally, we identify the sugar residue that provides the attachment site for lipid A-core-linked polysaccharides to an R1-type core OS molecule. The location of the attachment site is different from that predicted based on the characterized attachment site in E. coli R2 (9) and S. enterica sv. Typhimurium (10).

EXPERIMENTAL PROCEDURES
Bacterial Strains and Plasmids-The bacterial strains and plasmids used in this study are listed in Table I.
Media and Growth Conditions-Bacterial strains were routinely grown in Luria-Bertani (LB) broth (11) at 37°C, unless otherwise stated. Growth medium was supplemented with ampicillin (100 g/ml), chloramphenicol (30 g/ml), gentamicin (15 g/ml), or tetracycline (10 g/ml) as necessary. L-Arabinose was used at a final concentration of 0.02% for growth and induction of strains containing pBAD24 derivatives.
DNA Methods-General methods for manipulation and computer analysis of DNA are as outlined previously (5).
PCR and Sequencing Techniques-Oligonucleotides were synthesized using a Perkin-Elmer 394 DNA synthesizer and sequencing was performed using an ABI 377 DNA sequencing apparatus (Perkin-Elmer) in the Guelph Molecular Supercentre at the University of Guelph. PCR amplification was performed using a GeneAmp PCR Sys-tem 2400 from Perkin-Elmer. The Expand high fidelity enzyme mix (Boehringer Mannheim) was used as the polymerase enzyme in PCR reactions where products were greater than 5 kb. For generation of products of less than 5 kb in size, PwoI DNA polymerase (Boehringer Mannheim) was used. PCR amplification of the chromosomal region flanked by the waaC and waaA genes was performed as follows: (i) one initial cycle at 94°C for 1 min; (ii) 20 cycles at 94°C for 15 s and 68°C for 12 min; (iii) 16 auto cycles at 94°C for 15 s and 68°C for 12 min, with an autoextension at 68°C for 15 s/cycle; (iv) a final cycle at 72°C for 10 min. Oligonucleotide primers were based upon conserved regions of DNA sequence in the waaC and waaA genes of E. coli K-12 and S. enterica sv. Typhimurium. The sequence of these primers is as follows: (i) forward primer, 5Ј-ACGTTGCCCGCACTCACTGA-3Ј, which hybridizes near the 5Ј-end of waaC and (ii) complementary reverse primer, 5Ј-TTCGGTGGCAGGTAAGGTTC-3Ј, which hybridizes near the 3Ј-end of waaA. PCR products were purified using the QIAquick PCR purification kit from Qiagen. To ensure error-free sequencing, the sequence of each of the DNA strands was determined from the product of separate PCR runs. In the rare instances where a mismatch in sequence between strands occurred, a small region surrounding the mismatch was reamplified and sequenced.
In Vitro Mutagenesis and Gene Replacement-The waaG, -O, -T, -W, -V, and -L genes of the R1 core OS biosynthesis region were independently mutated by insertion of a nonpolar gentamicin resistance cassette (the aacC1 gene from Tn1696). Briefly, each of the genes was individually cloned from PCR-amplified products, and the aacC1 gene (obtained from plasmid pUCGM) was inserted into an appropriate site within the coding region of the specific waa gene (see Fig. 1C). Mutagenesis of the waaG, -O, -T, -W, and -V genes was performed by insertion of the aacC1 gene at a specific site (Fig. 1C), whereas mutagenesis of the waaL gene was performed by the replacement of a 396-base pair MluI fragment (present within the waaL coding region) with the aacC1 gene. A fragment containing the insertionally inactivated gene was then cloned into the suicide delivery vector pMAK705, and chromosomal gene replacement was performed as described previously (12). To verify phenotypes and the nonpolar nature of single mutations, each derivative of F470 was complemented with a recombinant pBAD24 plasmid carrying the corresponding PCR-amplified individual open reading frame. Lipopolysaccharide Analysis by SDS-PAGE-Small scale LPS preparations were made from SDS-proteinase K whole-cell lysates as described elsewhere (13). LPS was separated on 10 -20% gradient SDS-Tricine polyacrylamide gels, which were obtained from Novex (San Diego, CA). PAGE conditions were those recommended by the manufacturer, and the silver-staining procedure has been described (14). Western immunoblotting procedures have been described (12), as has production of polyclonal rabbit ␣-D-galactan I serum (15). LPS from an equivalent number of cells was loaded in each lane of SDS-polyacrylamide gels.
Preparation of Core Oligosaccharides-For isolation of LPS from CWG303 and CWG308, cells were first washed twice with each of ethanol, acetone, and petroleum ether. No preliminary cell-washing steps were performed on any other strain. Water-soluble LPSs were then obtained by hot water-phenol extraction of cells (16) and treated with 2% acetic acid at 100°C to cleave the acid-labile ketosidic linkage between the core OS and lipid A. The water-insoluble lipid A was isolated from the hydrolysate as a pellet by centrifugation (5000 ϫ g, 5°C). The supernatant, which contained core OS, was purified on a column of BioGel P-2 (1 m x 1 cm) with water as eluent.
Sugar Composition and Methylation Linkage Analyses-Compositional and linkage data were interpreted based upon the previously published structure of the R1 core OS (17) and that of the R4 core OS (17,18). Sugar composition analysis was performed by the alditol acetate method (19). Hydrolysis of glycosidic bonds was achieved by using 4 M trifluoroacetic acid at 100°C for 4 h. The samples were then reduced in H 2 O with NaBD 4 and acetylated with acetic anhydride using residual sodium acetate as the catalyst. Characterization of the alditol acetate derivatives was performed by gas-liquid chromatography-mass spectrometry using a Hewlett-Packard chromatograph equipped with a 30 m DB-17 capillary column (210°C (30 min) to 240°C at 2°C/min). Mass spectrometry in the electron impact mode was recorded using a Varian Saturn II mass spectrometer. Linkage analysis was carried out by methylation according to the procedure of Ciucanu and Kerek (20). The permethylated alditol acetate derivatives were fully characterized by gas-liquid chromatography-mass spectrometry in the electron impact mode using a column of DB-17 operated isothermally at 190°C for 60 min.
Nuclear Magnetic Resonance Spectroscopy-1 H NMR spectra of the core OSs were recorded on a Bruker AMX 500 spectrometer at 300 K using standard Bruker software. Prior to performing the NMR experiments, the samples were lyophilized three times with D 2 O (99.9%). The internal reference for 1 H NMR was the HOD peak (␦ H 4.786).

Identification of the Major Core OS Biosynthesis Operon of E.
coli R1-and R4-type LPS Strains-The structure of the outer core region of E. coli R1 and R4 LPS molecules is shown in Fig.  1A. Similarities between the known sequences of the waaA (encodes the 3-deoxy-D-manno-oct-2-ulosonic acid transferase) and waaC (encodes the HepI transferase) genes of E. coli K-12 and S. enterica sv. Typhimurium and the highly conserved inner core OS structures led us to assume that similar sequences would also exist in E. coli strains displaying other core OS types. This led to a PCR strategy already used to characterize the E. coli R2 core OS biosynthesis region (5). Primers were designed such that the region between the waaA and waaC genes of E. coli F470 (LPS core type R1) and F2513 (LPS FIG. 1. Structure of the outer core OS from E. coli F470 (R1 LPS core type) and E. coli F2513 (R4 LPS core type) and genetic organization and mutation of their chromosomal core OS biosynthesis regions. A, structure of the outer core OSs from the LPS of E. coli R1 and R4. All sugars are in the pyranose configuration, and the linkages are ␣ unless otherwise indicated. Also shown is the site of ligation of O-PS structures to an R1 core OS. The data that identify genetic determinants for the assembly of the core OS are reported in this study. B, physical maps of the sequenced regions from the chromosomes of E. coli F470 and F2513. Numbers indicate percentage of identity and similarity at both the amino acid and nucleotide level for respective homologs. Genes involved in the synthesis of the outer core OS are shown in white, while the waaL homolog (encoding the lipid A-core:surface polymer ligase protein) is shown in gray. The nucleotide sequence has been deposited in the GenBank TM data base under accession numbers AF019746 (F470) and AF019747 (F2513). C, map of the R1 core OS biosynthesis region indicating insertion sites of the nonpolar aacC1 gene cassette (represented by a triangle). Only restriction endonuclease sites used for aacC1 insertion are shown. The CWG*** designation of the resulting mutant strain is given (above the triangles). Also indicated, in the lower half of the figure are the regions of F470 chromosomal DNA that were cloned into pBAD24 (recombinant plasmids are designated pWQ***) for complementation of respective mutant strains. The cloned region from the F2513 chromosome, which encompasses the waaX gene (creating plasmid pWQ908), is as indicated. core type R4) was amplified. Sequencing of the PCR products indicated that the amplified region from F470 was 11.8 kb in size and that from F2513 was 11.5 kb in size. Both products contain 11 open reading frames. The two regions encode predicted products with high degrees of total similarity in all but two genes (Fig. 1B). We have established the role of three enzymes, WaaP, WaaQ, and WaaY, in assembly of the heptose region in the Enterobacteriaceae (21). WaaP phosphorylates the HepI residue in the core OS and is a prerequisite for other modifications by WaaY (phosphorylates HepII) and WaaQ (adds HepIII residue). A waaL homolog was identified in both the R1 and R4 core OS biosynthesis regions by hydropathy profile similarities between its predicted product and known WaaL proteins of E. coli K-12 (22), R2 (5), and S. enterica sv. Typhimurium (6). The WaaL protein is the only protein identified to date whose function involves ligation of cell surface polymers to lipid A-core molecules (1).
HexI Transfer-The core OS from S. enterica sv. Typhimurium as well as all of the E. coli core OS types have a ␣-D-Glcp-(133)-L-glycero-␣-D-manno-Hepp linkage, which defines the junction between the outer and inner core OS regions. The product of the waaG gene is involved in the formation of this disaccharide in S. enterica sv. Typhimurium (23), and the LPS structure of an E. coli K-12 waaG mutant is in agreement with this activity (24). Both the R1 and R4 core OS biosynthesis regions contain waaG homologs whose products are virtually identical (99.5% identity) (Fig. 1B). Moreover, given the conserved nature of the ␣-D-Glcp-(133)-L-glycero-␣-D-manno-Hepp linkage among E. coli and S. enterica sv. Typhimurium, all WaaG proteins (i.e. those of E. coli R1, R2, and R4 and S. enterica sv. Typhimurium) share total similarities of greater than 90%. Consistent with the assignment of the R1 and R4 WaaG proteins as UDP-glucose:(heptosyl) LPS ␣1,3-glucosyltransferases, LPS of strain CWG303 (waaG::aacC1 derivative of F470) migrates faster than F470 LPS in SDS-PAGE ( Fig.  2A). Compositional data (not shown) and linkage analysis (Table II) of the methylated CWG303 core OS indicate the absence of hexose sugars in the core OS molecule. Complementation of CWG303 with pWQ903 (which carries the R1 waaG gene) results in an LPS with equivalent mobility to that of F470 in SDS-PAGE (Fig. 2B).
HexII Transfer-The WaaI protein of S. enterica sv. Typhimurium and the WaaO proteins of E. coli K-12 and R2 are predicted to be HexII glycosyltransferases (5,25,26). The product of the gene designated waaO in the R1 and R4 core OS biosynthesis clusters shares 64% total similarity with the predicted WaaO protein of E. coli K-12 and R2 and 66% total similarity with the WaaI protein of S. enterica sv. Typhimurium. The position of waaO within the core OS biosynthesis region and the similarity of its product to the other putative HexII transferases suggest an involvement in the transfer of the HexII residue of the core OS backbone in R1 LPS core-type strains. The R1 and R4 WaaO proteins contain the consensus sequence features of the WaaIJ family (5) of ␣-glycosyltransferases (data not shown). Insertional inactivation of the waaO gene of F470 (strain CWG308) results in a truncated LPS molecule that migrates slower in SDS-PAGE than CWG303 LPS ( Fig. 2A). Moreover, linkage data of methylated core OS from CWG308 indicate that the only hexose present is terminal Glcp ( Table II). Complementation of CWG308 with pWQ904 (which carries the R1 waaO gene) returns the LPS mobility in SDS-PAGE to that of F470 (Fig. 2B). These data identify WaaO as the UDP-glucose:(glucosyl) LPS ␣1,3-glucosyltransferase. Given that the R4 core OS structure predicts an identical HexII transferase activity and that the R4 core OS biosynthesis region contains a virtually identical homolog of the R1 WaaO protein (99.7% amino acid identity) (see Fig. 1B), the equivalent gene in R4 has also been designated waaO.
HexIII Transfer-The transfer of the HexIII residue of the core OS backbone has historically been attributed to the activity of the product of waaJ (formerly rfaJ) in both S. enterica sv. Typhimurium and E. coli K-12. The nomenclature for the gene involved in this substitution has recently been changed to waaR in E. coli K-12 to differentiate its activity (formation of ␣-D-Glcp-(132)-␣-D-Glcp) from that of waaJ (formation of ␣-D-Glcp-(132)-␣-D-Galp) in S. enterica sv. Typhimurium (5). HexIII addition in the R1 and R4 core OSs involves formation of ␣-D-Galp-(132)-␣-D-Glcp, a disaccharide unique to the R1 and R4 core OS structures. The gene encoding the enzyme with this activity is therefore given the unique designation waaT. Tentative assignment of this gene in the R1 and R4 regions was initially based on its position immediately downstream of the gene involved in HexII transfer (as is the case in the S. enterica sv. Typhimurium, E. coli K-12, and E. coli R2 core OS biosynthesis regions). The predicted WaaT proteins of R1 and R4 share 99.4% identity (Fig. 1B). Moreover, the WaaT proteins share 58% total similarity with the WaaJ protein of S. enterica sv. Typhimurium and 57% total similarity with the WaaR protein of E. coli K-12 and R2. The WaaT proteins also contain the consensus sequence features of the WaaIJ family (5) of ␣-glycosyltransferases (data not shown). Insertional inactivation of the waaT gene of F470 (strain CWG309) results in a truncated LPS molecule that migrates slower than LPS of CWG308 in SDS-PAGE ( Fig. 2A), and linkage analysis of the methylated CWG309 core OS indicates that CWG309 contains both terminal Glcp and 3-linked Glcp (Table II). Introduction of plasmid pWQ905 (which carries the R1 waaT gene) into CWG309 returns the LPS profile in SDS-PAGE to that of F470 LPS (Fig. 2B). These data confirm that the waaT gene product is the UDP-galactose:(glucosyl) LPS ␣1,2-galactosyltransferase, involved in the addition of the HexIII residue of the R1, and by analogy, R4 core OSs.
HexIII  (Fig. 1A). The presence of an ␣-D-Galp-(132)-␣-D-Galp disaccharide is unique to the R1 and R4 E. coli core types; therefore, the enzymes responsible for the formation of this disaccharide are predicted to be unique to these core type organisms as well. The open reading frame designated waaW is present in both the R1 and R4 clusters, and, as indicated in Fig. 1B, the product of this gene is highly conserved between the two organisms (93% identity). The predicted WaaW protein of R1 is 342 amino acids in length, has a molecular mass of 39.4 kDa, and has an estimated pI of 5.8. The predicted WaaW protein of R4 is 341 amino acids in length, has a molecular mass of 39.3 kDa, and has an estimated pI of 5.2.
The pI values of the predicted WaaW proteins are surprisingly low when compared with other core OS glycosyltransferases whose pI values typically fall in a range of between 8 and 10. The WaaW proteins share only limited regions of identity with glycosyltransferases encoded by other E. coli core OS biosynthesis regions; however, they do contain the consensus sequence features found in the WaaIJ family (5) of UDP-␣-galactosyl-or UDP-␣-glucosyltransferases (data not shown). Given the high degree of similarity between the two WaaW proteins and that the ␣-D-Galp-(132)-␣-D-Galp substitution is present in both core OS structures, it was considered likely that the WaaW protein is involved in this substitution.
Insertional inactivation of the F470 waaW gene (strain CWG310) results in a truncated LPS with SDS-PAGE mobility slower than that of CWG309 ( Fig. 2A). Linkage analysis of the methylated core OS of CWG310 indicates that all three core OS backbone sugars are present in CWG310 but that the terminal Galp side branch is absent. Interestingly, linkage analysis indicated that the HexII side branch was also eliminated in this mutant based on the observed loss of terminal and 2,3-linked Glcp and concomitant appearance of 2-linked Glcp. This effect is due to a single mutation in waaW as wild-type core OS is restored in CWG310 complemented by plasmid pWQ906 (which carries the R1 waaW gene) (Fig. 2B). The addition of the HexII substitution in R1 is therefore dependent on the prior addition of the HexIII side branch.
The waaV Gene of E. coli F470 Encodes a UDP-glucose: (Glucosyl) LPS ␤1,3-Glucosyltransferase-The predicted trans-lation product of waaV is 327 amino acids in length, with a molecular mass of 38.8 kDa and an estimated pI of 9.0. BLASTP searches of the data bases indicate that WaaV shows limited similarity to a number of known and putative ␤-glycosyltransferases, a selected few of which are indicated in Table  III. Those proteins with a known function all catalyze the formation of a ␤-glycosidic linkage from nucleotide diphosphosugar donors in the ␣-configuration. A consensus sequence feature identified through BLASTP searches is the DXDD motif (present as D 93 DDD 96 in WaaV) located approximately 90 amino acids from the N terminus of these proteins. Utilizing HCA analysis, this motif was shown to lie in a structurally conserved region common to several ␤-glycosyltransferase proteins (29,30). An HCA plot was constructed for WaaV and aligned with an HCA plot of ExoU from Rhizobium meliloti (Fig. 3A). ExoU catalyzes the formation of a ␤-D-Glcp-(136)-␣-D-Glcp linkage in the synthesis of succinoglycan in R. meliloti and is the prototype of a growing family of ␤-glycosyltransferases, the ExoU family. HCA plots have been performed on the other transferases shown in Table III (29,30) and all show similar profiles. Members of the ExoU family of ␤-glycosyltransferases possess a similar structural domain (designated domain A) in the N-terminal 100 amino acid segment of the proteins. This conserved domain contains alternating regions of ␤-strands and ␣-helical loops (Fig. 3A). Typical of this family, conserved Asp residues are situated at the C-terminal ends of each of the ␤2 and ␤4 segments (indicated by circled Asp residues in Fig. 3A). Given their position in flexible loops at the C-terminal ends of ␤-strands, Saxena et al. (30) have speculated that these conserved, acidic Asp residues may be catalytic. The WaaV and ExoU proteins do not possess a domain B (data not shown) whose presence, in addition to domain A, is characteristic of processive ␤-glycosyltransferases. This observation suggests that WaaV is nonprocessive and only adds one ␤-linked sugar residue. Given that there is a single ␤-D-Glcp-(133)-␣-D-Glcp linkage in the core OS of R1-type LPS, it was considered likely that the product of the waaV gene encodes the necessary UDP-glucose:(glucosyl) LPS ␤1,3-glucosyltransferase activity.
Insertional inactivation of the waaV gene of F470 (CWG311) results in a truncated LPS, which, in SDS-PAGE, migrates slower than CWG310 (Fig. 2A). This phenotype is complemented by introduction of plasmid pWQ907 (which carries the R1 waaV gene) into CWG311 (Fig. 2B). Linkage analysis of the   ͉ ͉ Gal ␤-Gal a Predicted outer core OS structure which also indicates the attachment of the outer core OS to the inner core HepII residue. Precise linkages can be found in Fig. 1A. Although ␤-linked Glc is shown for clarity, determination of anomeric configuration requires 1 H NMR analysis. b Note that for the structure of CWG310, approximately 10% of the core OS molecules contain the HexII side branch Glcp. methylated core OS from CWG311 indicated the presence of all linkages found in the core OS of F470 except for the Glcp substitution of GlcII (Table II). While the 1 H NMR spectrum of F470 core OS contains a signal at 4.7 ppm (indicative of one ␤-linked hexopyranosyl residue) the 1 H NMR spectrum of CWG311 lacks a signal in the 4 -5 ppm range (data not shown). Moreover, the 1 H NMR spectrum of core OS from CWG311 containing pWQ907 showed a signal at 4.7 ppm, indicating restoration of the ␤-linked sugar. Linkage analysis of the methylated core OS of CWG311(pWQ907) identified sugars and linkages identical to that for LPS from F470. The presence of very minor amounts of 2-linked Glcp (see Table II) suggests that the complementation of CWG311 by pWQ907 did not occur at 100% efficiency. Taken together, these data confirm that the waaV gene encodes the UDP-glucose:(glucosyl) LPS ␤1,3-glucosyltransferase which substitutes GlcII (HexII) with ␤1,3-linked Glc. Based on all data presented so far, this substitution would appear to be the final sugar added in the assembly of the outer core OS of R1-type LPS. With the assignment of WaaV, all of the transferases for the assembly of the outer portion of the core OS of R1-type LPS have been identified (Fig. 1A).
The waaX Gene of E. coli F2513 Encodes a UDP-galactose: (Glucosyl) LPS ␤1,4-galactosyltransferase-The only difference between the structure of the outer core OS from E. coli R1 and R4-type LPS strains is the presence of a side branch ␤-linked Glcp attached to GlcII in R1-type LPS and a side branch ␤-linked Galp at the equivalent position in R4-type LPS. As discussed in the previous section, examination of the gene clusters and their predicted products led to the assumption that WaaV and WaaX differentiated the R1 and R4 outer core OS. The predicted WaaX protein is 257 amino acids in length, has a molecular mass of 30.6 kDa, and has an estimated pI of 9.5. BLASTP searches of the data bases using the deduced WaaX protein sequence identified similarity to a variety of known and putative ␤-glycosyltransferases, some of which are listed in Table III. Interestingly, none of those proteins that showed similarity to WaaV of R1 were identified as homologs of WaaX through these searches. Similarity between WaaX and related proteins is present within the N-terminal 100 amino acids. In particular, two conserved Arg residues in the N terminus, a Phe-Xaa-Phe-Phe-Asp motif located 30 -40 amino acids from the N terminus and a Glu-Asp-Asp motif located approximately 90 amino acids from the N terminus, were identified as consensus sequence features of this family of proteins. A representative of this family, LgtB of Neisseria gonorrhoeae, catalyzes the formation of a ␤-D-Galp-(134)-␣-D-GlcpNAc linkage in the synthesis of lipooligosaccharide (31). HCA plots of WaaX and LgtB indicate that the conserved sequence features of the proteins exist in similar structural regions of the proteins (Fig. 3B). As with the WaaV family of ␤-glycosyltransferases, it is tempting to speculate that the conserved Asp residues catalyze the formation of a ␤-linkage, but direct evidence for this is presently lacking.
In order to confirm that WaaX is the ␤-glycosyltransferase for R4 core OS synthesis, plasmid pWQ908 (which carries the R4 waaX gene) was introduced into CWG311, the R1 core OS strain that lacks the ␤-linked Glcp side-branch. The remainder of the glycosyltransferases required for the assembly of the core OS of these two core-type strains exhibit greater than 93% identity. It therefore seemed likely that any protein-protein interactions that are essential in the core OS assembly complex would be maintained with the introduction of a single heterologous glycosyltransferase into the R1 core OS assembly system. SDS-PAGE analysis of the LPS of CWG311(pWQ908) revealed that it migrates identically to that of parental F470 LPS (data not shown). Linkage analysis of methlyated core OS from CWG311(pWQ908) indicates that it contains 2,4-linked Glcp, no terminal Glcp, and a large quantity of terminal Galp (Table II), which is characteristic of a similar analysis of R4 core OS (18). Moreover, 1 H NMR identified the characteristic ␤-linkage signal at 4.5 ppm (data not shown). These data confirm that the WaaX protein of an R4 strain is the UDP-galactose:(glucosyl) LPS ␤1,4-galactosyltransferase, which adds the ␤-linked Galp side branch to the GlcII residue of the R4 core OS. With the identification of WaaX function, all of the glycosyltransferases required for the assembly of R4-type LPS have been identified and are shown in Fig. 1A.
Identification of the Linkage Site for Polysaccharide Attachment to the R1-type Core OS-Lipid A-core molecules may be "capped" with an O-PS before the LPS molecule is translocated to the outer membrane and exposed to the extracellular milieu. The enzyme that catalyzes the transfer of O-PS to the lipid A-core molecule is WaaL. The predicted WaaL protein was identified in each of the R1 and R4 core OS biosynthesis regions based on its higher order structure. Although the WaaL proteins of E. coli K-12 and R2, as well as that of S. enterica sv. Typhimurium, collectively share little primary sequence similarity, they are all predicted to be integral membrane proteins with more than eight membrane-spanning domains. These proteins contain hydrophilic domains of similar size and distribution. Mutations in these proteins obviate ligation between the O-PS and lipid A-core portions of the LPS molecule but do not result in core OS truncation. The R1 core OS prototype strain, F470, is a rough LPS derivative of O8:K27 that does not synthesize an O-PS but still assembles a complete R1-type core OS.
To study the attachment of O-PS to this lipid A-core acceptor, we therefore introduced a heterologous O-PS cluster, present on plasmid pWQ3, into F470 and derivatives. Plasmid pWQ3 is a pRK404 derivative that contains all of the genes necessary for the synthesis of the D-galactan I O-PS of Klebsiella pneumoniae O1 (32). As shown in Fig. 4, A and B, lane 1, F470 readily ligates D-galactan I to lipid A-core. The ligation-defective phenotype of an F470 waaL::aacC1 derivative (CWG317) was confirmed by the lack of observable D-galactan I in the presence of pWQ3 (Fig. 4, A and B, lane 2). The migration of CWG317 lipid A-core is identical to wild-type F470, indicating that a complete core OS molecule is still assembled.
The HexIII residue provides the ligation site for O-PS in S. enterica sv. Typhimurium (10) and E. coli R2 (9), and ligation activity is strictly dependent on the presence of a HexIII side branch substituent (5,6). Surprisingly, smooth LPS was still produced when pWQ3 was introduced into CWG310, which lacks a HexIII side branch substituent. The amount of ligated O-PS was, however, dramatically reduced and not readily visible in silver-stained SDS-PAGE (Fig. 4A, lane 4), smooth LPS was observed by using more sensitive Western immunoblotting techniques (Fig. 4B, lane 4). These data preclude a conserved ligation reaction, at least in terms of the core OS acceptor molecule, between the core OSs of E. coli R1, R2, and S. enterica sv. Typhimurium. The low level of ligation could be explained by the influence of the waaW mutation in CWG310 upon the addition of the ␤-linked Glcp residue on HexII (Table  II). Elimination of the ␤-linked Glcp side branch by the waaV mutation in CWG311 resulted in an inability to ligate any O-PS (Fig. 4, A and B, lane 3). These data show that the attachment site for O-PSs is the ␤-Glcp residue. Examination of the structure of the linkage region in smooth LPS is not possible due to the overwhelming signals from the O-PS sugars. To circumvent this problem, we examined strain CWG294 (serotype O8:K40). The parent strain of CWG294 attaches both O8 antigen and K40 antigen to an R1 core OS by the action of the ligase enzyme (12). However, CWG294 attaches only a single repeat unit of K40 antigen due to a wzy (K40 polymerase) mutation (12). 2 The core OS fraction with the attached K40 repeat unit was purified from CWG294 and examined by methylation linkage analysis. All components expected for the R1 complete core OS were identified (data not shown). However, the terminal Glcp signal (from the ␤-linked residue) was eliminated and replaced by 3-linked Glcp, confirming the ␤-glucopyranosyl residue as the linkage site in a native LPS (see Fig. 1A). OS biosynthesis in E. coli K-12 and S. enterica sv. Typhimurium have been studied, definitive assignment of glycosyltransferase activity to specific gene products is often lacking. This study of the R1 core OS biosynthesis region (and, by analogy, R4) is the first example of a single system where all of the glycosyltransferases required for the biosynthesis of the outer core OS have been unambiguously assigned. The structures of the R1 and R4 outer core OSs and the genetic determinants involved in their assembly (as determined in this study) are shown in Fig. 1A. By combining genetic manipulation with chemical structure determination, glycosyltransferases shared by the R1 and R4 system have been identified, and the single variant glycosyltransferase that gives rise to R1/R4 specificity has been defined. The overall organization of the R1 and R4 waa clusters is similar to those of E. coli K-12, R2, and S. enterica sv. Typhimurium, especially with respect to the waaQ, waaP, and waaY genes, which are required for biosynthesis of the highly conserved heptose region of the core OS (21). Interestingly, some differences do exist. First, there is a somewhat surprising absence of waaS and waaZ homologs in the waa region of R1 and R4. It has previously been suggested that these genes encode proteins that direct the formation of an "LOS" form of LPS, a form that is not capped by an O-PS (22). Whether strains containing R1-or R4-type LPS lack a second form of LPS is not known, and the biological impact of these differences is unclear. It is intriguing to note, however, that strains displaying an R1-type core predominate in clinical E. coli (7,8), and both R1 and R4 core types occur in Shigella spp. (33).
The R1 and R4 core OS biosynthesis regions each contain two genes encoding glycosyltransferases that are not found in the core OS biosynthesis region of S. enterica sv. Typhimurium, E. coli K-12, or E. coli R2. One of these transferases, WaaW, is common to both the R1 and R4 clusters. We have identified WaaW as the UDP-galactose:(galactosyl) LPS ␣1,2-galactosyltransferase enzyme that adds the side branch ␣-D-Galp onto the terminal ␣-D-Galp residue within the core OS backbone of both the R1 and R4 core OS structures. The WaaW protein contains conserved sequence features that are found in the family of HexII and HexIII transferases (5). An interesting feature of the WaaW protein is that it has a predicted pI that is significantly lower than other core OS glycosyltransferases. Typically, core OS glycosyltransferases (including WaaG and WaaK, among others) have pI values ranging from 8 to 10. Glycosyltransferases involved in the synthesis of the core OS portion of LPS are all predicted to be peripheral membraneassociated proteins, and it is therefore not surprising that the majority of these proteins have a net positive charge at a physiological pH, enabling them to maintain an association with the cytoplasmic membrane. A net negative charge for the WaaW protein may allow for an association with other glycosyltransferases in a region of a glycosyltransferase complex that is further removed from the inner face of the cytoplasmic membrane. This possible interaction would appear to be unique to the core OS assembly systems of R1 and R4. To date, no other core OS glycosyltransferases have been identified with such a low pI value.
The WaaV glycosyltransferase protein differentiates the R1 core OS assembly system from R4. The WaaV protein is the UDP-glucose:(glucosyl) LPS ␤1,3-glucosyltransferase. This enzyme adds the side branch ␤-linked Glcp residue to GlcII of the R1 core OS backbone. Utilizing HCA analysis, this protein is classified as a member of the ExoU family of nonprocessive ␤-glycosyltransferases (29, 30). These proteins form ␤-glycosidic linkages from ␣-linked sugar nucleotide precursors and flexible loop regions within the proteins contain putative cata-lytic residues. As can be seen in Fig. 3, the HCA plot of the N-terminal region of WaaV predicts a similar structure to those equivalent domains in proteins within the ExoU family. WaaV contains several potential catalytic Asp residues (Asp 43 , Asp 93 , Asp 95 , and Asp 96 ) in similar locations to those within other members of this family of proteins.
The WaaX protein differentiates the R4-type LPS core assembly system from R1. It encodes the UDP-galactose:(glucosyl) LPS ␤1,4-galactosyltransferase, which adds the side branch ␤-linked Galp residue to GlcII of the R4 core OS structure and replaces the WaaV-mediated activity of the R1 system. HCA analysis predicts that potential catalytic residues in WaaX (Asp 35 , Glu 90 , Asp 91 , and Asp 92 ) occur in flexible loop structures within the protein. Their position within the protein is similar to that of the potential catalytic Asp residues found in the ExoU family of proteins, but the conservation of domain A structure (typical of the ExoU family) is somewhat lacking in the WaaX family of proteins (Table III). As highlighted in Fig.  3 and Table III, the products of the waaV and waaX genes show limited similarity to two different families of ␤-glycosyltransferases. This argues against simple mutation of one of these genes and subsequent genetic drift in the evolution and differentiation of R1 and R4 core OS types. Rather, a more likely scenario is that these genes were laterally transferred along with the adjacent waaL to an ancestral core OS region. This event thus resulted in the generation of a new E. coli LPS core chemotype.
Such lateral transfer events also explain the unusual location of waaL (encodes the O-PS ligase) as part of the waaC operon, rather than its more typical position as the last gene of the waaQ operon (5). As noted for E. coli K-12 and S. enterica sv. Typhimurium, differential regulation occurs between these two operons. The waaQ-containing operon requires the antitermination effects of RfaH to transcribe genes distal to the promoter region, whereas the shorter waaC-containing operon is under the control of three promoters in E. coli K-12, one of which appears to be a heat-shock promoter (34). Lack of sequence data upstream of waaC in any organism other than E. coli K-12 and S. enterica sv. Typhimurium has not allowed any insight into the possible regulation of this operon in the R1 or R4 systems. The biological implication of this altered genetic organization and its influence on ligase expression is not known.
A WaaL-deficient mutant of E. coli F470 fails to ligate a reporter O-PS to lipid A-core, a result that might be predicted based on the phenotype of waaL mutants of E. coli K-12, R2, and S. enterica sv. Typhimurium. The attachment site for O-PSs in E. coli R2 and S. enterica sv. Typhimurium is the HexIII residue, in both cases a Glcp residue (9,10). Further, the HexIII side branch ␣-linked GlcpNAc residue, which occurs in both E. coli R2 and S. enterica sv. Typhimurium, is required for ligation activity (5,6). In contrast, the R1 core OS contains a unique ligation site at the HexII side branch ␤-Glcp residue, giving the R1 smooth LPS a fundamentally different organization from that assumed for the Gram-negative enteric bacteria based on the widely accepted prototype, S. enterica sv. Typhimurium. Pairwise alignments of the known WaaL proteins indicate that similarities range from as low as 13.1% identity/ 28.9% total similarity for WaaL proteins of E. coli R4 and S. enterica sv. Typhimurium to as high as 65.8% identity/81.1% total similarity for WaaL proteins of E. coli R2 and S. enterica sv. Typhimurium. The E. coli R1 WaaL protein is most closely related to WaaL of R4 (33.1% identity; 54.0% total similarity). Given this, it is reasonable to assume that the attachment site for O-PSs to R4-type core OS molecules is the ␤-linked Galp residue. Previous studies have shown that the core OS struc-ture dictates ligase specificity, whereas the O-PS does not play a role. As an example, E. coli K-12 is capable of ligating many heterologous O-PS structures to its core OS molecule. However, the ligase enzyme of E. coli K-12 cannot complement a waaL mutant of S. enterica sv. Typhimurium. The differences in WaaL proteins from E. coli R1 and R4 presumably reflect differences in the attachment site.
This study indicates that the outer region of R1-type core OS molecules (and by analogy, R4-type core OS molecules) is assembled by the completion of the main chain outer core OS backbone (HexI, -II, and -III), followed by substitution of HexIII and, finally, substitution of HexII. This conclusion is derived from the linkage data (Table II) of methylated R1 core OS derivatives, which show, for example, that CWG311 contains all core OS sugars except for the ␤-linked Glcp residue. Therefore, the WaaV protein of E. coli R1 (and the WaaX protein of E. coli R4) complete the assembly of the core OS molecule, and this activity allows the subsequent linkage between the core OS and O-PS to occur (Fig. 4). This order of assembly allows core OS completion while accommodating lateral transfer of the novel waaV and waaL genes into an ancestral core OS gene cluster. The order in which genes encoding outer core OS glycosyltransferases are transcribed in the long, central operon of the waa cluster of R1 core OS-type bacteria parallels the order in which these sugar residues are attached in the elongating core OS molecule. Translational products of the waa operon could potentially be incorporated into an ordered complex of transferases as they are made, such that elongating core OS molecules (attached to lipid A in the fluid cytoplasmic membrane) need simply pass across the face of this ordered complex. Completed lipid A-core molecules are subsequently translocated to the periplasmic face of the cytoplasmic membrane, where ligation to an O-PS may occur prior to the completed LPS molecule being transferred to the outer membrane. Understanding of these latter steps that translocate the LPS molecule into the outer leaflet of the outer membrane remains elusive.
The R1 core OS predominates in surveys of clinical E. coli (7,8). It remains to be established whether the genetic organization of the core OS region, its potential impact on waaL expression, and/or the structural features of the resulting LPS afford a selective advantage in pathogenic E. coli.