A novel pathway for O-polysaccharide biosynthesis in Salmonella enterica serovar Borreze.

The plasmid-encoded gene cluster for O:54 O-polysaccharide synthesis in Salmonella enterica serovar Borreze (rfbO:54) contains three genes that direct synthesis of a ManNAc homopolymer with alternating β1,3 and β1,4 linkages. In Escherichia coli K-12, RfbAO:54 adds the first ManNAc residue to the Rfe (UDP-GlcpNAc::undecaprenylphosphate GlcpNAc-1-phosphate transferase)- modified lipopolysaccharide core. Hydrophobic cluster analysis of RfbAO:54 indicates this protein belongs to the ExoU family of nonprocessive β-glycosyltransferases. Two putative catalytic residues and a potential substrate-binding motif were identified in RfbAO:54. Topological analysis of RfbBO:54 predicts four transmembrane domains and a large central cytoplasmic domain. The latter shares homology with a similar domain in the processive β-glycosyltransferases Cps3S of Streptococcus pneumoniae and HasA of Streptococcus pyogenes. Hydrophobic cluster analysis of RfbBO:54 and Cps3S indicates both possess the structural features characteristic of the HasA family of processive β-glycosyltransferases. Four potential catalytic residues and a putative substrate-binding motif were identified in RfbBO:54. In Δrfb E. coli K-12, RfbAO:54 and RfbBO:54 direct synthesis of smooth O:54 lipopolysaccharide, indicating that this O-polysaccharide involves a novel pathway for O-antigen transport. Based on sequence and structural conservation, 15 new ExoU-related and 17 new HasA-related transferases were identified.

Lipopolysaccharide (LPS) 1 is a major component of the outer membrane of Gram-negative bacteria. The hydrophobic lipid A portion forms the outer leaflet of the outer membrane. In enteric bacteria, an O-polysaccharide is attached to lipid A via a short core oligosaccharide. The O-polysaccharide extends away from the cell surface to give a hydrophilic surface layer and represents a major surface antigen (O-antigen). It is characterized by a repeat unit structure, with different epitopes arising from variations in the nature, order, and linkage of the sugar monomers in the O-repeat unit and from the addition of side branches and "decorations" such as acetyl, ketal, and glycosyl residues.
Members of the genus Salmonella are serologically diverse. Approximately 50 O-antigens are recognized in Salmonella, and this diversity, combined with variable flagellar antigens, has resulted in the identification of greater than 2,000 serologically distinct Salmonella strains. Strains are assigned to serogroups based on shared expression of major O-antigen epitopes or factors (1). Synthesis of the O-antigen is directed by the products of the rfb gene cluster, and the structural diversity seen in the LPS O-antigens reflects variation in the rfb gene clusters; this variation is thought to have arisen from repeated lateral gene transfer and recombination events involving rfb genes. Selective pressure from the host immune response is proposed to be the driving force for the continued generation of antigenic diversity (2).
Serogroup O:54 is a heterogeneous group of 13 different Salmonella serovars. It is unique among Salmonella serogroups in that expression of the O:54 antigen requires the presence of a small plasmid. This is the only known plasmidencoded O-polysaccharide in Salmonella. We have studied the 6.9-kb plasmid pWQ799 from serovar Borreze to determine the role of these plasmids in expression of the O:54 antigen (3). pWQ799 is a novel ColE1-related plasmid carrying the entire rfb O:54 biosynthetic cluster (4). This is the only reported example of a ColE1-related plasmid carrying genes for the synthesis of cell-surface antigens. The plasmid is mobilized in the presence of an appropriate helper plasmid, providing the first defined mechanism for lateral gene transfer of O-antigen biosynthesis genes in Salmonella enterica. Mobilization of plasmids containing rfb O: 54 (5). Two LPS O-polysaccharide pathways have been described, and Rfe may play a role in either of these two pathways (reviewed in Ref. 6). The O:54 O-polysaccharide shares a number of characteristics with the O-polysaccharides synthesized by the Rfc-independent O-polysaccharide biosynthetic pathway including the following: 1) it is a homopolymer; and 2) Rfe is required for synthesis although GlcNAc does not form part of the repeating unit. The Rfcindependent pathway has so far only been identified for a few homopolymeric O-polysaccharides in Klebsiella O1 (7,8), Escherichia coli O8 and O9 (9 -11), and Serratia marcescens O16 (12). In this pathway, Rfe initiates synthesis by transferring GlcpNAc-1-P to the carrier lipid to form the acceptor upon which the complete O-polysaccharide chain is subsequently polymerized. GlcpNAc therefore does not form part of the repeating unit. O-polysaccharides synthesized by this pathway are polymerized in the cytoplasm (13), and then the completed polymer is transported across the plasma membrane by an rfb-encoded dedicated ABC (ATP-binding cassette) transporter, prior to ligation to lipid A core. Analysis of rfb O: 54 and O:54 expression indicates that transport of the nascent O:54 polysaccharide occurs by a different mechanism. This communication describes the organization of rfb O:54 and the characterization of the two glycosyltransferases that direct the synthesis of the O:54 O-polysaccharide. Our studies indicate that synthesis of O:54 LPS follows a novel mechanism distinct from either of the two characterized O-polysaccharide pathways.
DNA Manipulation and Analysis-Restriction enzyme digestions, ligations, and CaCl 2 transformations were performed as described by Sambrook et al. (15). DNA fragments to be subcloned were gel-purified using the GENECLEAN kit (Bio/Can Scientific, Mississauga, Ontario, Canada). Plasmids were column purified using QIAGEN spin columns (QIAGEN Inc., Chatsworth, Calif.) according to the manufacturer's instructions.
Nucleotide sequencing procedures are described elsewhere (4). Sequence data were edited and analyzed using AssemblyLIGN and MacVector software (International Biotechnologies Inc., New Haven, CT). Hydropathic analysis of the predicted protein sequence was done using the algorithm of Rao and Argos (18) with a minimum length for the transmembrane helix of 16. Homology searches of nucleotide and amino acid sequences in the National Center for Biotechnology Information data bases were done with the BLAST (basic local alignment search tool) server analysis program (19). Multiple alignments were performed using Clustal (20). Hydrophobic cluster analysis (HCA) was performed using the HCA-Plot program (Doriane Informatique, Le Chesnay, France). This program writes protein sequences on a duplicated ␣-helical net and circles clusters of hydrophobic amino acids (Ala, Val, Leu, Ile, Met, Phe, Gln). The plots are then visually compared for similarity in the hydrophobic cluster patterns, limiting analysis to the predicted globular portions of the proteins. ␤-Strands and ␣-helices are deduced based on the observed association of specific hydrophobic cluster shapes with secondary structures (21).
TnphoA Mutagenesis of Plasmid-encoded rfb O: 54 Determinants-Plasmid-encoded rfb O:54 genes were mutagenized using TnphoA as described previously by Manoil and Beckwith (14) . Fusions were made in plasmid pWQ800 (3). The precise site of insertion was mapped by sequencing out of the phoA orf using a primer (5Ј-CAGTAATATCGC-CCTGAGCA-3Ј) that is complementary to the phoA sequence between nucleotides 79 and 98 (22).
RNA Isolation and Primer Extension-RNA was purified from a mid-log phase culture of E. coli DH5␣ (pWQ820) using TRIzol Reagent (Life Technologies, Inc.) according to the manufacturer's instructions. Primer extension reactions were done using the oligonucleotide primer TS154 (5Ј-TTTCATAATGTCGATCTGTTAATCC-3Ј) that corresponds to the complementary sequence to nucleotides 3676 -3700 of plasmid pWQ799. The primer was end-labeled with [␥-32 P]ATP (DuPont NEN) and T4 polynucleotide kinase (Boehringer Mannheim, Laval, Quebec, Canada) and then purified using a QIAquick spin column (QIAGEN Inc., Chatsworth, CA). Primer extension experiments were performed using the First-Strand cDNA Synthesis Kit from Pharmacia (Pharmacia Biotech, Baie D'Urfé, Quebec) following the manufacturer's recommendations. DNA sequencing was done with the Sequenase version 2.0 sequencing kit (U. S. Biochemical Corp.).
LPS Extraction and Purification of Core Oligosaccharides-LPS samples were prepared either from SDS-proteinase K whole cell lysates, as described by Hitchcock and Brown (23), or by using a modification of the phenol/water extraction method; LPS was collected from both the aqueous and phenol phases (3). For compositional analysis of the LPS core oligosaccharides, lipid A was removed from phenol-purified LPS by hydrolysis in 1.5% acetic acid at 100°C for 2 h. Precipitated lipid A was removed by centrifugation, and the supernatant was lyophilized. Samples were resuspended in water to a final concentration of 5 mg/ml, hydrolyzed, and analyzed by high performance anion-exchange chromatography as described previously (24).
Tricine-SDS-PAGE of LPS-LPS samples were analyzed by SDS-PAGE using commercially prepared 10 -20% gradient Tricine gels from Novex (San Diego, CA). Electrophoresis conditions were those specified by the manufacturer. Gels were silver-stained using the method of Tsai and Frasch (25).

RESULTS
Organization of rfb O:54 -Previous analysis of the nucleotide sequence and genetic organization of pWQ799 indicated that approximately half of the 6915-base pair plasmid (nucleotides 72-3384) is involved in plasmid replication and mobilization (4). These regions are related to ColE1 and possess an average G ϩ C content of 50 -53% (Fig. 1A). This value is typical of Salmonella genomic DNA (26). In contrast, the remaining pWQ799 sequences have a uniformly lower average G ϩ C content of 39%, with no detectable homology to any known ColE1-related sequences. The junctions between the high and low G ϩ C regions in pWQ799 coincide with the 5Ј and 3Ј ends of the plasmid replicon regions. Abnormally low G ϩ C values relative to those typical for the species is a common observation for genes involved in polysaccharide synthesis (27). These observations therefore suggested that the remaining DNA contained rfb O:54 determinants.
Computer analysis for coding regions combined with sequence homology searches in the NCBI data bases (see below) resulted in the identification of three potential open reading frames ( Fig. 1B;  and Ϫ35 E. coli promoter sequences were detected in the region immediately upstream of rfbA O:54 . To localize the rfb O:54 promoter, the 1.14-kb HincII-SspI fragment of pWQ799, containing rfbA O:54 and a 90-base pair 5Ј-flanking region, was cloned in front of the promoterless cat gene of pKK232-8. When transformed into E. coli K-12, the resulting plasmid (pWQ822; Fig. 1C) conferred chloramphenicol resistance, indicating the presence of an endogenous promoter. The transcription start site was mapped by primer extension of the oligonucleotide TS154 using total cellular RNA from E. coli DH5␣ (pWQ820; Fig. 1C). A single band was obtained in the resulting autoradiogram (data not shown), identifying the start site as nucleo-  (Fig. 1B). E. coli CC118 (pWQ800B6) was O:54deficient and PhoA-negative on indicator media, indicating that this region of the protein is present in the cytoplasm. This cytoplasmic location is in agreement with the function of Rf-bA O:54 (see below) and with protein topology predictions by the positive-inside rule. This rule allows the prediction of the topology of a bacterial inner membrane protein based on the observation that positively charged amino acids (Arg ϩ Lys) are more abundant in cytoplasmic loops (i.e. ϳ15%) than in periplasmic loops (ϳ5%) (29). The Arg ϩ Lys content of this region was 14%.
The translated amino acid sequence of RfbA O:54 shares significant homology with a number of putative bacterial glycosyltransferases ( Fig. 2 and Table I). RfbA O:54 shares 32% identity with the predicted product of the rfb EcO7 gene orf275 (30), 26% identity with the hypothetical protein 6 of the lsg locus of Haemophilus influenzae, and 25% identity with the AmsE protein of Erwinia amylovora (31). Analysis of the protein alignments of these predicted products revealed that the sequence conservation was relatively uniform throughout the length of the proteins (Fig. 2). A number of protein sequences in the data bases were also identified with significant levels of homology over the N-terminal 192 amino acids of RfbA O:54 . Alignment of this region gave identity levels of 21 and 20%, respectively, with the ExoU and ExoO proteins from Rhizobium meliloti (32)(33)(34) and 21% with the LgtA glycosyltransferase from Neis- seria gonorrhoeae (35) (Table I).
Previous studies of glycosyltransferases have demonstrated that there is often insufficient sequence similarity for functional predictions using traditional sequence alignments (36). However, transferases which catalyze the formation of glycosidic linkages with the same stereochemistry and with structurally related substrates are predicted to share a similar three-dimensional architecture in the catalytic and binding domains. This would be reflected in the presence of conserved structural regions or sequence motifs for shared mechanistic functions. Such domains can be identified using HCA (36,37). This method plots the two-dimensional pattern of protein sequences and allows visual comparison and detection of conserved structural features. Using HCA, Saxena et al. (36) compared the two-dimensional structure of five different glycosyltransferases of known catalytic functions, including ExoO and ExoU from Rhizobium meliloti. They identified a structural region, domain A, that was present in all five proteins. The transferases all possess a common catalytic activity: formation of a single glycosidic linkage with a ␤-configuration from ␣-linked nucleotide diphospho sugar donors. Domain A is therefore speculated to be directly involved in this shared activity. Since the region of similarity detected between RfbA O:54 , ExoU, and ExoO includes the region containing domain A in the Rhizobium proteins, HCA was used to compare RfbA O: 54 with ExoU and the other RfbA O:54 homologous proteins (Fig. 3).  Table I. Identical amino acids are indicated by asterisks; similar amino acids are indicated by the dots.  (36), the proteins were also found to contain at least one Asp residue in the loop at the C-terminal end of the ␤2 strand (Fig. 3). The other protein sequences obtained from the data base search with RfbA O:54 were also examined for the presence of these conserved features (Fig. 4). A total of 15 proteins, in addition to the 7 originally described by Saxena et al. (36), were identified as being members of the ExoU family (Table I). . This program writes protein sequences on a duplicated ␣-helical net and circles clusters of hydrophobic amino acids (Ala, Val, Leu, Ile, Met, Phe, Gln). The plots are then visually compared for similarity in the hydrophobic cluster patterns, limiting analysis to the predicted globular portions of the proteins. Plots were aligned using the results of amino acid sequence alignments as a starting point. Hydrophobic clusters with obvious similarities were used as anchors for the structural alignment, as were regions containing glycines (ࡗ) and prolines (ૺ), which are often present in loops (21). Vertical lines were drawn to indicate structurally conserved features. The prediction of ␤-strands and ␣-helices is based on the observed association of specific hydrophobic cluster shapes with secondary structures (21). Amino acid denoted with one-letter code except for proline (ૺ), glycine (ࡗ), serine (j), and threonine (Ⅺ). Conserved residues are circled. Regions i-iv indicate structural regions that appear to have a conserved two-dimensional architecture surrounding the (EDY) motif; these regions were not described in the initial characterization of the ExoU family.  (8,9). A third transferase activity is therefore predicted for the transfer of the first ManNAc residue to undecaprenol-P-P-GlcNAc, the product of the Rfe reaction (5). To determine which rfb O:54 gene product was responsible for the initial ManNAc transfer, plasmids carrying either rfbA O:54 (pWQ823; Fig. 1C) or rfbB O:54 (pWQ819; Fig. 1C) were transformed into different E. coli K-12 backgrounds, and LPS in the whole cell lysates of the transformants was analyzed by SDS-PAGE (Fig. 5). In E. coli DH5␣ (rfe ϩ rffE ϩ ), neither gene was sufficient for O:54 synthesis and expression of a ladder of O-antigen-substituted LPS. Introduction of the two genes together on plasmid pWQ802, either in strain DH5␣ or the rfb-delete strain S874, was sufficient for expression of authentic O:54 O-polysaccharide. These results were confirmed by Western immunoblot using absorbed polyclonal O:54 antisera (data not shown). Analysis of the LPS core regions in SDS-PAGE profiles revealed that in the presence of rfbA O:54 , an additional LPS band was synthesized; this band migrated slightly slower than the lipid A core fraction of the host strain. In contrast, LPS from the rfbB O:54 -containing strain was indistinguishable from that of the parental strain. Further analysis determined that the RfbA O:54 -mediated band was not synthesized in an rfe Ϫ host strain, as would be expected for a biosynthetic pathway initiated by Rfe (Fig. 6A). The structure of the E. coli K-12 core oligosaccharide has been determined (38, 39) and does not include any ManNAc residues. The demonstration of ManNAc in the RfbA O:54 -modified core would therefore be indicative of a ManNAc transferase function catalyzed by RfbA O:54 . Purified core oligosaccharide from E. coli DH5␣ (pWQ823) was therefore analyzed by high performance anion-exchange chromatography. A single additional peak corresponding to mannosamine, the acid-hydrolyzed product of ManNAc, was detected in the chromatogram of the RfbA O:54 -dependent core oligosaccharide (data not shown).
To determine the size of the RfbA O:54 -dependent band, the LPS from E. coli DH5␣ (pWQ823) was analyzed by SDS-PAGE alongside two LPS samples which each contain a modified core fraction of known size and composition (Fig. 6B). Plasmid pJK2363 contains the Shigella dysenteriae galactopyranosyltransferase gene, rfpB (40). Rfe ϩ K-12 host strains expressing rfpB synthesize a core oligosaccharide modified by the addition The RfbA O:54 -modified lipid A core fraction was not present in E. coli S874 (pWQ823), indicating that synthesis of this LPS fraction requires one or more function(s) encoded by the K-12 rfb gene cluster (Fig. 6A). Expression of O:54 LPS in S874 (pWQ802) indicates that this requirement is overcome by the simultaneous activities of rfbA O:54 and rfbB O:54 . Ligation of newly polymerized O-chains to lipid A core occurs at the periplasmic face of the plasma membrane and must therefore follow trans-plasma membrane transport of the undecaprenolbound intermediate. A similar rfb K-12 -dependent transport activity has been demonstrated for the RfbF KpO1 and RfpB-mediated core modifications (8,42). The E. coli K-12 O-antigen transporter, RfbX, is believed to be responsible for the transport of the lipid-bound RfbF KpO1 and RfpB products (8,42,43). Given the nature of this transport event and the involvement of both host and cloned plasmid functions, it was possible that the appearance of a single modified core band in E. coli DH5␣ (pWQ823) resulted from a substrate size limitation imposed by RfbX, rather than the monofunctional transferase activity of RfbA O:54 . In this case, lipid-linked intermediates with higher degrees of polymerization would be synthesized but would remain in the cytoplasm, attached to carrier lipid. To address this possibility, the O-polysaccharide ligase-deficient strain E. coli CS2334 was transformed with pWQ823. Under such conditions any O-polysaccharide that is formed, but not transported, should accumulate in the cytoplasm. E. coli CS2334 (pWQ823) was phenol-extracted, and because the polymeric O:54 LPS partitions primarily into the organic phase (3), both phases were examined. The linkage between undecaprenol and carbohydrate polymer is phenol-labile (43) and, as a consequence, extracted O-haptenic material remains in the supernatant following a 100,000 ϫ g centrifugation step. Supernatants from both the aqueous and phenol phases were size-fractionated on a Sephadex G-50 column, and fractions containing amino sugars were analyzed by 1 H NMR. Although extracted high molecular weight ECA was detected in these experiments, no ManNAc-containing polymer was present (data not shown). These data confirm that RfbA O:54 transfers a single ManNAc residue.
Sequence Analysis of RfbB O:54 -RfbB O:54 is predicted to be a 53.3-kDa protein composed of 459 amino acids, with a calculated pI of 8.32. Hydropathic analysis predicted four transmembrane helices between residues 11-40, 325-340, 385-406, and 416 -438. This protein is therefore expected to be an integral membrane protein. Based on the relative distribution of positive amino acid residues, the hydrophilic region defined by residues 340 and 384 is predicted to lie in the periplasm (6% Arg ϩ Lys). This location was confirmed by construction of an in-frame PhoA-positive fusion (pWQ800B22) at amino acid 368 (Fig. 1B).
The  (Table II). Most of these proteins are glycosyltransferases involved in the synthesis of bacterial cell surface polysaccharides. Two of these proteins, Cps3S and HasA, are also predicted to be integral membrane proteins with similar hydropathy plots and four predicted transmembrane domains. Cps3S is a glycosyltransferase that directs the synthesis of the type 3 capsule of Streptococcus pneumoniae (45,46). The enzyme is bifunctional and processive, catalyzing the formation of [33)-␤-D-GlcA-(134)-␤-D-Glc- (13] n (45). HasA is the hyaluronic acid synthase from Group A Streptococcus pyogenes (47,48). This protein is also a bifunctional processive ␤-glycosyltransferase, catalyzing the formation of a polysaccharide with the structure [34)-␤-D-GlcA-(133)-␤-D-Glc- (13] n (47).
Using HCA, Saxena et al. (36) compared the plots of HasA with a number of known processive ␤-glycosyltransferases. They reported a correlation between the presence of two conserved structural regions (domains A and B) and the shared catalytic activity. Domain A is common to both the ExoU and HasA families of ␤-glycosyltransferases; whereas domain B, located a short distance downstream of domain A, is unique to the HasA family. A single conserved Asp residue in region II and a conserved sequence motif (QXXRW) in region IV were both reported to characterize domain B. Comparison of the HCA plots of the large hydrophilic domains of RfbB O:54 , Cps3S, and IcaA, with that of HasA, confirmed the presence of both domains in these proteins (Fig. 7). All three proteins possessed the conserved Asp and (QXXRW) motif of domain B and with the exception of Cps3S, the two conserved Asp residues characteristic of domain A were also present in RfbB O: 54 and IcaA (Fig. 7). Although the Asp-␤4 is conserved in Cps3S, there is no Asp immediately next to the ␤2 sheet of the protein; however, an Asp residue is located in the middle of ␣2 (Fig. 8). A search of the data bases identified a total of 17 other proteins, in addition to the 6 originally identified by Saxena et al., that possess the conserved features of the HasA family (Table II). Very little is known about anabolic glycosyl transfer reactions, but it has been proposed that, mechanistically, this type of reaction may be viewed as the reverse reaction of the glycosyl transfer reaction performed by O-glycosidases, the difference being that the result is the extension rather than hydrolysis of an oligosaccharide or polysaccharide chain (51). By analogy with the extensively characterized polysaccharide hydrolase systems (reviewed in Refs. 51,52), this hypothesis predicts that formation of a ␤-glycosyl linkage from an ␣-linked sugar nucleotide donor would involve the same type of catalytic event as that of the inverting glycoside hydrolases. Hydrolysis of glycosidic bonds by inverting glycosidases results in a net inversion of configuration at the anomeric center of the reducing sugar product. The catalytic mechanism involves two acidic active site amino acids that act as acid-base catalysts. Among the cellulases and xylanases, in every instance where the catalytic residues have been identified, the amino acid has been either an aspartate or a glutamate (52). One of these residues is believed to act as the acid catalyst to protonate the substrate, whereas the other is thought to act as the base catalyst by deprotonating water. The two catalytic residues are located in flexible loop regions in the active site cleft, between substratebinding subsites. Characterization of RfbA O:54 -related proteins in data bases has identified 15 new members of the ExoU family, in addition to the 7 originally described by Saxena et al.  (53). These new members all possess two conserved Asp residues within the putative catalytic domain (Fig. 4). The second, in the terminal loop of ␤4, falls in a more strictly conserved region of the proteins and is surrounded by additional acidic residues. In RfbA O:54 , sequence and HCA plot alignments predict that the catalytic residues are Asp-41 and either Asp-93, Asp-95, or Asp-96.
HCA also identified a conserved motif (EDY) downstream of domain A, in a region containing amino acid clusters with a shape typical of ␣-helices (21). Carbohydrate ligands are believed to interact with binding subsites through a combination of ionic interactions, hydrogen bonding, and stacking interactions between aromatic amino acids and the hydrophobic patches of carbohydrate monomers (reviewed in Ref. 52). The conservation of the (EDY) motif and its position within a region of structural conservation, a short distance from a putative catalytic residue, suggests this motif might represent part of a binding subsite. Multiple sequence alignment revealed that a similar motif occurs in other members of this family (Fig. 4) although the order of the first two acidic amino acids may be reversed and in some cases the third amino acid may be a Lys. Although Tyr is an aromatic amino acid, both it and Lys are structurally similar, in that both have long chains and may participate in hydrogen bonding through either a terminal hydroxyl (Tyr) or amino (Lys) group. The acidic nature of Asp and Glu would also allow hydrogen bonding of these residues with a carbohydrate ligand. Given the predominance of aromatic amino acids surrounding the (EDY) motif, it is possible that the surrounding residues participate in stacking interactions with the hydrophobic patch of the glycosyl monomer. The only protein that did not possess the (EDY) motif was Dpm1. This enzyme catalyzes the transfer of mannose from GDP-man to dolichol phosphate (54). The fact that the ligand recognized by Dpm1 is not a carbohydrate is further support for the hypothesis that this motif is involved in substrate binding.
RfbB O:54 is predicted to be an integral membrane protein with four transmembrane helices, a large central hydrophilic domain and a periplasmic loop. Searches of the data bases  (45,46) and HasA from Group A S. pyogenes (47,48). Both proteins have four predicted transmembrane domains with a large central hydrophilic region, and HCA of these hydrophilic domains revealed that RfbB O:54 and Cps3S both possess the same multidomain structure and conserved residues previously identified in the HasA family of processive ␤-glycosyltransferases recently described by Saxena et al. (53). There are two conserved structural regions in proteins within this family. The first region (domain A) is common to both the ExoU and HasA families, and the second (domain B) is specific to the HasA family. Domain B is subdivided into four regions and has a conserved Asp residue at the C-terminal end of region II and a conserved sequence motif (QXXRW) at the C-terminal end of region IV. A search of the data bases for proteins possessing the conserved features of this family of proteins identified 17 new members (Table II) It is interesting to note that synthesis of the E. coli K5 capsule also involves a processive glycosyltransferase, KfiC (55). In this polysaccharide, the repeating unit consists of alternating ␤and ␣-linkages, Saxena et al. (36) have proposed a model for the processive mechanism of polymerization in the HasA family of proteins. The model accounts for the characteristic multidomain architecture of the HasA family of proteins and, as with the monofunctional ␤-glycosyltransferases, is based on the hypothesis that inverting anabolic glycosyltransferases use the same catalytic mechanism as inverting glycoside hydrolases (51). According to this model, domains A and B represent different catalytic domains that together allow these proteins to catalyze two ␤-glycosidic bonds, either simultaneously or sequentially. The subsequent loss of the two UDP groups from the catalytic sites is proposed to provide the driving force for the chain to move through the catalytic cleft until the terminal sugar interacts with the last binding subsite, allowing two more UDP sugars to enter. The simultaneous formation of two glycosidic linkages provides a simple mechanism for the generation of the 2-fold screw axis that arises from a disaccharide repeat with two ␤-glycosidic linkages, without invoking a concomitant rotation of either the enzyme or the substrate. It also provides an effective mechanism for maintaining the fidelity of a heteropolysaccharide disaccharide repeat. However, by analogy with the inverting glycosidases, such an activity would involve a total of four conserved acidic amino acid residues, two in each domain. Sequence and HCA plot alignments identified only one conserved acidic residue in domain B. Based on sequence conservation in the chitin synthases, Nagahashi et al. (57) have also speculated that this residue is involved in catalysis. To further substantiate this hypothesis, these workers used sitespecific mutagenesis to replace this conserved Asp (Asp-562) with the longer Glu and observed a 100% reduction in enzyme activity. They also replaced Asp-562 with an Asn, to determine whether the hydroxyl group of Asp was necessary for activity. Not surprisingly, a complete loss of activity was observed in the mutant. Because the inverting mechanism predicts two catalytic residues in domain B, the sequences of all of the known transferases in the HasA family were examined for additional conserved amino acids. The search was limited to the region extending from the end of ␤4 to approximately 25 residues past the (QXXRW) motif. The C-terminal end of the search region corresponds to the start of a predicted transmembrane domain in RfbB O:54 , Cps3S, HasA, IcaA, and NodC. A highly conserved proline was identified a short distance in front of the (QXXRW) motif, in the junction between regions III and IV (Fig. 8). This Pro is predicted to lie within a loop at the C-terminal end of a ␤-sheet. In all of the proteins, an Asp or Glu was found 2-4 residues before the conserved Pro. It is possible that the conserved carboxylate adjacent to this Pro represents the second catalytic residue of domain B. In RfbB O:54 , the catalytic residues are therefore speculated to be Asp-92 or Asp-94, Asp-151, Asp-244, and either Glu-268 or Asp-269.
The position of the (QXXRW) motif relative to the fourth potential catalytic residue in these proteins suggests that the motif may represent part of a binding subsite. This possibility is supported by the predicted interactions of the residues in the motif with a carbohydrate ligand: hydrophobic interactions between the aromatic Trp residue and the hydrophobic patch of a glycosyl monomer, and hydrogen-bonding interactions between the guanidinium side chain of Arg and the glycosidic hydroxyls. Noting that this motif was highly conserved in the chitin synthases, Nagahashi et al. (57) also speculated on the possible function of this motif in enzyme activity. To confirm a structure-function relationship, they used site-specific mutagenesis to individually replace each residue in the motif, and then measured enzyme activity and K m values in the resulting mutant proteins. In each case, a conservative change resulted in a reduction in activity and either an increase or decrease in binding affinity. While these results were interpreted as evidence for a role in catalysis, they could equally be interpreted as evidence for a role in binding of the substrate or in hydrogenbonding interactions with a catalytic residue, thereby maintaining the correct orientation for catalysis.
There are presently two known pathways for O-antigen biosynthesis. These pathways are fundamentally different. Key criteria distinguishing the two are the cellular location of the polymerization reaction and the mode of export across the plasma membrane (6). In Rfc-dependent synthesis, polymerization involves block-wise addition of single O-repeat units and occurs at the periplasmic face of the plasma membrane. Individual O-units are assembled on undecaprenol-P in the cytoplasm and then transported across the plasma membrane, presumably by the O-unit transporter, RfbX (58). Ligation to lipid A core is catalyzed by RfaL. Rfc-independent O-polysaccharide biosynthesis is currently limited to homopolymeric Opolysaccharides. In this pathway, synthesis is initiated by the Rfe-dependent transfer of GlcNAc-1-P to undecaprenol-P. The complete O-polysaccharide chain is then polymerized in the cytoplasm prior to being delivered to the site of ligation by an rfb-encoded dedicated ABC transporter. At first glance, the pathway for O:54 polysaccharide synthesis appears similar to the Rfc-independent pathway of O-antigen synthesis. In both cases, synthesis is initiated by Rfe and the O-repeat is a homopolymer. In addition, by analogy with other more well-characterized HasA-related proteins, RfbB O:54 is expected to polymerize the O:54 polysaccharide in the cytoplasm, where pools of activated precursor are available. It is at this point in the pathway that O:54 biosynthesis diverges from the Rfc-independent pathway; there is no dedicated ABC transporter for export of the polymerized O:54 polysaccharide. This is the only known O-polysaccharide system that does not encode either an ABC transporter or an RfbX O-unit transporter. Despite the absence of either of these components, smooth O:54 LPS is expressed in a ⌬rfb E. coli K-12 host strain containing only rfbA O:54 and rfbB O:54 . We have previously shown that, in the absence of the cognate ABC transporter, an E. coli K-12 strain containing the remaining rfb genes from K. pneumoniae O1 (Rfc-independent) accumulates O-antigen in the cytoplasm (13). There is, therefore, no alternate, generic O-polysaccharide export system in E. coli. Consequently O:54 synthesis represents a new pathway for O-antigen assembly, involving a different mechanism for delivering nascent O-polysaccharide to the LPS O-antigen ligase.
Intriguingly, no export system has yet been identified for polymers produced by a number of other processive ␤-glycosyltransferases in the HasA family. These polysaccharides include bacterial cellulose (Acetobacter xylinum and Agrobacterium tumifaciens (53,59,60)), hyaluronic acid (S. pyogenes (47,48,61), the type 3 capsule of S. pneumoniae (45,46), alginate (P. aeruginosa (62)), and chitin (S. cerevisiae, Candida albicans, Emericella nidulans and Neurospora crassa (63)(64)(65)(66)(67)(68)). In contrast, transmembrane transport of the rhizobial Nod factors occurs through the action of a dedicated ABC transporter (69,70). However, synthesis of Nod signal factor differs somewhat from that of cell-surface polysaccharides, as a carrier lipid has not yet been identified; the polymer that is synthesized is much shorter and is generally thought to be secreted (71); and the oligosaccharide product is substituted with acyl, acetyl, and sulfate groups in a strain-specific manner (72). The NodC proteins are all highly homologous, and their predicted topology differs from that of RfbB O:54 , HasA, and Cps3S. NodC proteins may therefore represent a subfamily of the HasA family. The predicted topology of RfbB O:54 , with a periplasmic loop following the cytoplasmic glycosyltransferase domain, suggests the possibility that the protein possesses two separate activities, catalyzing the polymerization of the O:54 polysaccharide and coupling this with transport in a vectorial reaction, similar to that suggested for chitin synthases (73). The C-terminal transmembrane domains would therefore be predicted to form a pore or channel through which the growing chain is extruded. Similarities in size, two-dimensional architecture, and hydropathy plots point to the possibility of a similar transferase/transport function for Cps3S, IcaA, and HasA.
Synthesis of the O:54 polysaccharide clearly represents a third and new pathway for O-antigen assembly. With the identification of potential catalytic residues in RfbB O:54 and the speculated role for the periplasmic loop in transmembrane transport, the O:54 O-polysaccharide provides a relatively simple system for testing this putative export function and for examining the mechanism of catalysis of ␤-glycosyltransferases. These analyses will also serve to characterize a novel O-polysaccharide biosynthetic pathway.