Structure of the Bundle-forming Pilus from Enteropathogenic Escherichia coli*

Bundle-forming pili (BFP) are essential for the full virulence of enteropathogenic Escherichia coli (EPEC) because they are required for localized adherence to epithelial cells and auto-aggregation. We report the high resolution structure of bundlin, the monomer of BFP, solved by NMR. The structure reveals a new variation in the topology of type IVb pilins with significant differences in the composition and relative orientation of elements of secondary structure. In addition, the structural parameters of native BFP filaments were determined by electron microscopy after negative staining. The solution structure of bundlin was assembled according to these helical parameters to provide a plausible atomic resolution model for the BFP filament. We show that EPEC and Vibriocholerae type IVb pili display distinct differences in their monomer subunits consistent with data showing that bundlin and TcpA cannot complement each other, but assemble into filaments with similar helical organization.

A wide variety of pathogenic bacteria express fimbriae or pili, surface appendages that in many cases allow the microbe to attach to host epithelia. This attachment is often the first step toward colonization of their preferred host niche. The most important classes of pili are the chaperone-usher family and the type IV fimbriae. Type IV pili are produced by a vast array of important human pathogens including Vibriocholerae, Neisseria gonorrhoeae, Neisseria meningitidis, Pseudomonas aeruginosa, Legionella pneumophila, Salmonella enterica serovar Typhi, and enteropathogenic Escherichia coli (EPEC). 3 A number of important phenotypes are associated with the expression of type IV pili including auto-aggregation (1-3), adherence (4 -6), twitching motility (7,8), biofilm formation (9), horizontal gene transmission (10 -12), cellular invasion (13,14), and virulence (1,15,16).
The bundle-forming pili (BFP) of EPEC is an excellent model for the study of type IV pili. BFP are required for the characteristic localized adherence of EPEC to epithelial cells (5,17), for auto-aggregation and for full virulence in volunteers (1). The 14-gene bfp operon contains all of the genes specifically required for BFP biogenesis (18). Mutagenesis studies have shown that all but one of these genes is necessary to produce functional BFP (19 -21).
The first gene of the bfp operon, bfpA, encodes prebundlin, which upon cleavage yields a 19-kDa protein that is the only known structural component of BFP (5). Prebundlin is associated with the inner membrane with its 12-amino acid N-terminal leader sequence in the cytoplasm and the C-terminal, globular head domain located in the periplasm. Prebundlin is processed into mature bundlin after removal of the N-terminal leader sequence by the prepilin peptidase, BfpP (22). It has also been shown that DsbA, a periplasmic oxidoreductase stabilizes the globular head domain by catalyzing the formation of a well conserved disulfide bond in the C-terminal part of bundlin (23). Single substitutions of the two cysteine residues to serine are detrimental for pilus biogenesis and EPEC cell adhesion (23). The conserved disulfide bond is a common feature of type IV pili (24). In addition, type IV pili share sequence similarity in the fifty N-terminal amino acids, which have been assumed to form an extended, hydrophobic N-terminal helix that promotes the assembly of type IV fimbriae (25).
As BFP from EPEC represents an attractive system to study the assembly of type IVb pili, we have investigated the structure of the major subunit of BFP filaments, bundlin, by NMR. Here we report the high resolution structure of the 18.3-kDa globular head domain from bundlin and reveal a new variation in the topology of type IVb pilins. In combination with EM data collected on native BFP we were able to propose an atomic resolution model for the type IVb filament structure.

EXPERIMENTAL PROCEDURES
NMR Spectroscopy-The 18.3-kDa 15 N-13 C-labeled recombinant bundlin domain was expressed and purified as described previously (26). NMR samples of bundlin (350 l) were prepared at a concentration of about 0.5 mM into 20 mM sodium phosphate buffer pH 5.2, 0.1% NaN 3 , and 3-(trimethylsilyl)-propionic acid for referencing 1 H, 15 N, and 13 C chemical shifts. NMR data were acquired at 303 K on a 500 MHz four-channel Bruker DRX500 spectrometer equipped with a z-shielded gradient triple resonance cryoprobe. Sequence-specific backbone and side chain assignments for bundlin were determined previously (26). Three-dimensional 500 MHz 1 H-15 N and 800 MHz 1 H-13 C NOESY-HSQC data were collected with a mixing time of 100 ms. Spectra were processed with NMRPipe (27) and analyzed using NMRview (28). 15 N relaxation rate measurements were performed as previously described (29,30) on a 800 MHz Varian Inova spectrometer. For the measurement of T1, ten experiments were performed with relaxation delays of 8. 6, 9.7, 193, 795, 996, 1493, 2000, 193, and 996 ms. Ten experiments were also performed for the measurement of T2 with relaxation delays of 8. 5 Structure Calculations-Secondary structure elements in bundlin were first identified using the chemical shift-based, dihedral angle prediction software TALOS (31). A manual analysis of 15 N and 13 C NOESY-HSQC spectra allowed the backbone NOEs characteristic of secondary structures to be identified. Additional long distance NOEs were assigned between side chain protons of residues Ile 47 and Tyr 104 . The manually assigned NOEs were used to start the automated, itera-tive, assignment procedure ARIA 1.2 (32,33), which employed the structure calculation program CNS (34). Hydrogen bonds were measured from H 2 O/D 2 O exchange experiments and extracted from 1 H-15 N HSQC spectra recorded every hour over 24 h. The disulfide bridge between Cys 116 and Cys 166 (numbering referenced to bundlin) was included in the calculations. 100 structures were imposed in the final iteration, and 15 lowest energy structures with no distance and dihedral angle violations greater than 0.2 Å and 5°, respectively were retained and water-refined in ARIA. A summary of NMR-derived restraints and statistics on the ensemble of structures is shown in TABLE ONE. Bacterial Culture-50 l of EPEC bacterial culture, wild-type strain E2348/69, were used to seed growth at a 1:100 dilution in Hepes-modified Dulbecco's modified essential medium and grown for 3 h, without shaking at 37°C. Bacteria were collected by centrifugation at 4000 ϫ g for 5 min at room temperature, and the pellet was resuspended in 50 l of phosphate-buffered saline. The bacterial suspension was diluted 2-fold in phosphate-buffered saline, and TMV was added at 10 g ml Ϫ1 as a calibration control.
Bacteria were negatively stained using a modified version of Valentine's technique. 2 l of the bacterial/TMV suspension were pipetted between a thin carbon film and its mica support (1 mm 2 ). The carbon film was washed by partially floating onto water in a teflon block and then floated off completely on 1% uranyl acetate stain. The carbon film was then collected on a lacey grid (Ted Pella) from below and viewed in an FEI CM200 TEM operating at 120 kV. Images of bacteria expressing BFP were recorded in low-dose mode (using Kodak SO163 film) at a magnification of ϫ36,000 with the first zero of the contrast transfer function in the range of 18 -22 Å.
Image Analysis-Suitable micrographs were selected based on electron optical parameters (such as appropriate defocus, stigmation, and absence of drift) and were digitized using a Nikon8000 Coolscan densitometer at 6.35 m/pixel. All computational analysis was carried out using IMAGIC software (35).
Two-dimensional Projection Image-Individual filaments were divided into segments of 350 Å in length, which were then normalized, band-pass filtered, masked, aligned, and summed to obtain a higher contrast, average two-dimensional projection image. In addition, larger areas containing multiple, naturally aligned BFP were selected, aligned, and summed, as above.
Fourier Transformation-The FTs of the individual pili, the summed projection image and individual and summed areas of BFP were calculated and compared. All transforms resulted in common layer lines. The distance between the layer lines was measured to determine the axial repeat. A lattice was drawn on the diffraction pattern resulting from the summed FTs of individual BFP, which incorporated the major reflections. Bessel orders were calculated from the spacing of the reflections from the meridional line, as described under "Results." Computational Assembly of Filament-A space-filling fiber was built based on the NMR structure of the bundlin monomer and the helical parameters and measurements derived from the EM data. First, the 22-amino acid N-terminal helix was restored to the bundlin monomer. The orientation of the bundlin monomer was then carefully optimized by analogy with the type IV pilus models (36). The long N-terminal ␣-helix of the bundlin monomer was first oriented roughly parallel to the central axis of the pilus with a slight orientation of about 10°of the monomer with respect to the central axis to avoid clashes in the BFP filament. Furthermore, the bundlin monomer was oriented such that the N-terminal helix was positioned inside the helical strand, forming the innermost layer of the helical fiber. Hlxbuild software (situs.biomachina.org) was then used to assemble a single helix six subunits long. The helical rise was set to 22 Å; the radial displacement of each chain from the helical axis (x-offset in hlxbuild) was optimized to 23 Å, which gave an outer diameter close to that measured by EM (ϳ75 Å) and avoided subunit clashes in the final structure.
The resulting helical strand was then duplicated by opening two further copies using Swiss-Pdb (37) to form a three-stranded helical fiber. Each copy was then translated by 44 and 88 Å, respectively. The helical strands then were merged to create the final representation.
Coordinates Deposition-Coordinates for the ensemble of NMR structures have been deposited at the Protein Data Bank under the accession code 1zwt. Tables of NMR assignments and restraints have been deposited in the BioMagResBank in Madison, WI under the accession code 6003.

RESULTS
Description of the Structure of Bundlin-Recombinant, double-labeled 15 N-13 C-mature bundlin lacking the hydrophobic N-terminal 22 residues was expressed and purified as described previously (26). Backbone resonance and side chain assignments were obtained using standard triple resonance NMR experiments (38). A correlation time of about 11.5 ns was calculated from 15 N relaxation data (T1/T2 ratios along the sequence shown in Fig. 1A), which is consistent with that of the 18.3 kDa bundlin monomer. Overall 15 N relaxation data for the backbone amide of bundlin reveal a well ordered structure with very few highly mobile regions (Fig. 1). Interestingly, resonance assignments were not achieved for residues Lys 167 , Asn 168 , and Thr 169 as signals were not observed in the NMR spectra (numbering referenced to bundlin). Furthermore, high T1/T2 fluctuations are observed for the region between residues 162 and 173 without a decrease of 1 H-15 N NOE (Fig. 1A). This is indicative of conformational exchange for the protein backbone in this region as is also evidenced by 1 H-1 H NOE cross-peaks of much lower intensity than those exhibited by the ordered parts of the protein.
The ensemble of 15 low energy structures with neither distance nor dihedral violations more than 0.2 Å or 5°, respectively is shown in Fig.  1B. The three-dimensional structure of the globular head domain of bundlin is composed of a seven-stranded mixed parallel and antiparallel ␤-sheet surrounded by four ␣-helices and two 3 10 -helices (Fig. 1C). The backbone root mean square deviation is 0.79 Ϯ 0.14 Å for the secondary structure elements and 1.09 Ϯ 0.19 Å for all atoms (TABLE ONE). The four helices encompass residues Lys 32 -Tyr 51 (␣1-C), Leu 60 -Asn 67 (␣2), FIGURE 2. Electron microscopy of BFP from EPEC. A, electron micrograph of negatively stained wild-type EPEC expressing BFP. B, naturally occurring, aligned bundles of BFP shown at a higher magnification. Thicker filaments are TMV whose 23-Å spacing was used as a calibration standard. C, projection of a two-dimensional average of aligned, summed, short individual segments of BFP. The pitch of the main 3-start helix is indicated, together with the approximate diameter of the pilus. Protein is shown as white. Scale bars: 200 nm.  DECEMBER 2, 2005 • VOLUME 280 • NUMBER 48

JOURNAL OF BIOLOGICAL CHEMISTRY 40255
Lys 113 -Ala 120 (␣3), and Thr 158 -Ala 165 (␣4) with ␣3 and ␣4 conjoined by the disulfide bond between Cys 116 -Cys 166 . The loop region (52-59) connecting ␣1 and ␣2 displays a higher degree of flexibility in bundlin structure than the rest of the molecule as shown by a decrease of 1 H-15 N NOEs within this region (Fig. 1A). In contrast, the second loop (68 -80) connecting ␣2 to the ␤-sheet possesses a short 3 10 -helix from Asp 73 -Tyr 75 that is rigidly associated with the rest of the secondary structure.
The ␤-sheet region of bundlin is flanked by the four ␣-helices on three sides. The ␣1-C helix is positioned across the concave face of the sheet by hydrophobic interactions between Ile 47 /Tyr 104 , Thr 44 /Tyr 104 , and Ala 43 /Phe 87 . Residues Ile 47 , Tyr 51 , Tyr 57 , Leu 60 , Ile 64 , Val 93 , and Pro 95 delineate a hydrophobic platform located between the first two helices and the first part of the ␤-sheet. Significant hydrophobic interactions involving residues Ile 47 , Tyr 51 , Tyr 57 , and Tyr 104 lead to a welldefined orientation for ␣2 of 90°relative to ␣1-C. The long loop, 68 -80, following ␣2 also adopts a fixed orientation (Fig. 1, C and D), mediated by hydrophobic interactions between residues Leu 65 , Pro 72 , Tyr 75 , and Tyr 105 .
Analysis of BFP by Electron Microscopy-BFP expressed by wild-type EPEC were viewed after negative staining (Fig. 2, A and B). The majority of BFP was expressed in early stage growth and formed closely associated swathes of pili emanating from the bacterial membrane. Clearly discernable individual pili were chosen for further analysis. Short sections corresponding ϳ350 Å in length were selected, band-pass filtered, normalized, aligned, and summed, resulting in the two-dimensionalaveraged structure shown in Fig. 2C. The image clearly shows a repeating diagonal pattern of protein densities (white) along the pilus, which is indicative of a helical structure. From this averaged projection image, the pili are estimated to have a diameter of about 75 Å. From this structure, the pitch of the dominant helix was estimated to be 44 Å, as shown in Fig. 2C. Fourier transforms were calculated from the two-dimensional projection image, from individual pili and also from larger areas selected from the naturally occurring aligned bundles of pili. All Fourier transforms have major layer lines in common.
The FT shown in Figs 3, A and B is derived from the sum of transforms calculated from individual pili and was used to deduce the helical parameters of BFP. All pili were centered prior to calculation of their FT.
This summed pattern was used as the layer lines are most clearly represented. A two-dimensional lattice was drawn on the FT to intercept the major reflections (Fig. 3B). The distances between these reflections and the equatorial line were then measured, as shown. Three main reflections are visible with spacings of 1/88 Å, 1/44 Å, and 1/22 Å on layer lines 4, 8, and 12, respectively. The distance between layer lines was measured as 1/175 Å, which corresponds to the axial repeat defined by the predicted 1-start helix. The 1/22 Å meridional spacing was only visible in some FTs, but defines the axial rise of the subunits as 22 Å. The distances between the major reflections and the meridional line were also measured (d) and the Bessel order (n) was then calculated using the equation n ϭ 2Rr Ϫ 2, where r ϭ 1/d and r is the radius of the filament. The Bessel orders describe two helical families comprising a 6-start helix (layer line 4) with a pitch of 88 Å and a 3-start helix (layer line 8) with a pitch of 44 Å. It is notable that the main reflection consistently seen on all the calculated FTs corresponds to the 3-start helix, which appears to be the dominant feature. This 3-start helix, with an axial rise of 22 Å and pitch of 44 Å thus has six subunits per turn, which is to be compared with the 3-start 45-Å helix with six subunits per turn described for the Tcp of V. cholerae (25). All spacings were measured using the 1/23 Å Ϫ1 third layer line of TMV as a calibration standard.

Bundlin Structure Provides New Insights into the Type IV Family-
The NMR structure of bundlin, the major subunit of BFP from EPEC, adopts an ␣/␤ fold composed of four ␣-helices that surround a platform formed by a seven-stranded, twisted, contiguous ␤-sandwich. Our study reveals a new topology for the mixed parallel and antiparallel ␤-strands arranged in the following order ␤2, ␤3, ␤1, ␤7, ␤4, ␤6, and ␤5. Although common characteristics can be observed between bundlin and other type IV pilin structures, the new structural features reported here for bundlin have not been described for other members of the type IV pilin family.
The type IV pilin family is divided in two groups, types IVa and IVb, distinguished by the length of the pre-pilin leader sequence and the size of the pilin protein (36). The first high resolution structures of pilin proteins were those from the type IVa class, GC from N. gonorrhoeae (42), PAK and K122-4 from P. aeruginosa (39,40). The type IVa pilin structure is characterized by an extended N-terminal ␣-helix and a globular head domain that is folded into an ␣-␤ roll configuration (39,40,42). The globular head domain is well conserved and characterized by a four-stranded antiparallel continuous ␤-meander among all the type IVa pilins (36). The structures of TcpA from V. cholerae and PilS from S. typhi from the type IVb group have been determined recently by x-ray diffraction (25) and NMR, respectively (41). The type IVb pilin proteins are characterized by a longer leader peptide, larger globular head domain and display a significantly different topology than the type IVa proteins (25,36). The globular head domain of bundlin also exhibits a novel variation of the arrangement of ␤-sheet and ␣-helices. Furthermore, crystallographic analysis of the pseudopilin PulG from Klebsiella oxytoca reveals the four internal anti-parallel ␤-strands characteristic of type IV pilins, but the highly variable loop region with a disulfide bond is not found (43).
Type IVb pilins are characterized by two regions, the ␣␤-loop and the D-region. The ␣␤-loop includes the first two N-terminal ␣-helices (␣1 and ␣2), whereas the D-region is delineated by helices ␣3 and ␣4, which are linked by the conserved disulfide bond (25,36). Comparison of the bundlin structure with that of TcpA and PilS (Fig. 4) reveals some structural similarities in the arrangement of strands in the central region of the ␤-sheet, but bundlin adopts several unique structural features. The structure-based sequence alignment and superimposition of bundlin, TcpA, and PilS are shown in Figs. 5 and 6.
Both similarities and striking differences are observed among the topologies of the three proteins. The ␣␤-loop region of the three proteins differs in size and in the distribution of secondary structure elements. The ␣␤-loop in TcpA is defined by the two N-terminal ␣-helices, ␣1 and ␣2, followed by a long loop (25,36) whereas an additional antiparallel ␤-strand is observed in the equivalent loop of PilS (41) (Figs. 5  and 6). The shorter ␣␤-loop region in bundlin, composed of ␣1-C and ␣2, is significantly more structured and displays an altered orientation with respect to the long N-terminal helix. In contrast to the ␣␤-loop, fragments in the D-region show some structural equivalence within the three pilins, i.e. within ␣3, ␣4, and the short region encompassing the ␤4and ␤6-strands in bundlin (Figs. 5 and 6). It is noteworthy that, after the superimposition of the equivalent regions between bundlin and both TcpA and PilS, some hydrophobic positions in the D region appear to be conserved between the three proteins (Fig. 6).
A common architectural characteristic of TcpA and PilS is the inclusion of the C-terminal ␤-strand in the middle of the structure forming an antiparallel arrangement. In contrast, the C-terminal ␤-strand ␤7 in bundlin, forms a mixed parallel and anti-parallel ␤-sheet with the adjacent ␤1and ␤4-strands, because of the novel insertion of the short ␤1-strand between ␤3 and ␤7 (Fig. 4). As expected, the structure-based sequence alignment reveals that the fragment containing the ␤1-strand in bundlin is absent in both TcpA and PilS sequences (Figs. 5 and 6). Despite this major difference, equivalence can be established between ␤2 of bundlin and ␤1 and ␤3 of TcpA and PilS, respectively and among the C-terminal ␤-strands of all three pilins. The region delineated by ␤3 of bundlin, ␤2 in TcpA and ␤4 in PilS appears to be equivalent in the three proteins with some conserved hydrophobic residues (Fig. 6).
BFP Model and Biological Implications-By combining the monomer structure with the native structure from electron microscopy we are able to present a model of the pilus. This provides some insight into the function of this important virulence factor. Our NMR data reveal that exclusion of the first thirty amino acids of bundlin ("the N-terminal hydrophobic helix") impairs the ability of the pilin to assemble into BFP. Helical reconstruction of BFP was constrained by experimental data obtained from EM of negatively stained filaments and, as suggested in other models of Type IV pili (36,43), the positioning of the N-terminal hydrophobic helix within the central cavity of the pilus. Although the flexibility or bending of the N-terminal helix cannot be assessed from our data, our proposed model of BFP filament displays an innermost layer formed by a coiled-coil structure of the highly conserved N-terminal ␣-helices (Fig. 7B) (36). Subunit-subunit interactions between globular head domains involve residues from the ␣␤-loop and D-region and are located on the edges of the outermost layer in the fragment immediately preceding ␣2-, and the ␣4-helix, respectively (Fig. 7C). The 3-start left-handed helix of our model is similar to other recently published models and suggests that type IV pili could be assembled from three sites simultaneously. This concept is consistent with the recent observation that alternating subunits of the hexameric BfpD ATPase engage the pilus biogenesis machine, implying there are three sites at which pilin subunits are added concurrently (44). Although the helical parameters determined by EM for TCP and BFP filaments are very similar, local interactions and forces that maintain the structural integrity of the pilus are likely to be different as we observe striking differences within the ␣␤-loop and D-regions. For instance, the ␣2-helix within the ␣␤-loop from the BFP is more solvent-exposed and not involved in significant subunit-subunit contacts. This is consistent with observations that replacing key elements of bundlin with the corresponding regions from TcpA renders EPEC unable to produce pili (45). An analysis of sequence variations in bfpA revealed nine different alleles that can be divided into two major classes termed ␣ and ␤ (46,47). Sequence variations were concentrated in the C terminus. In the BFP structure, the majority of these variable residues are also located in the outer rim of the pilus (Fig. 7E). In particular a region of the bfpA gene encoding amino acids 115-152 has an excess of non-synonymous substitutions, indicating that this portion of the protein is under selective pressure for diversification. One potential source of this selective pressure is the host immune response. Consistent with this hypothesis, this FIGURE 6. Structure-based sequence alignment between bundlin, TcpA, and PilS. Sequence alignment was based on the results given by Dali between bundlin and TcpA. Alignment between bundlin and PilS was also reported as structural equivalence can be observed. Schemes of secondary structure corresponding to each three-dimensional structure are represented above each primary sequence. For the sake of clarity, only the ␣-helices and ␤-strands were annotated in bundlin. Residues highlighted in blue are identical in the three proteins. Residues shown in red correspond to hydrophobic residues found in the same position within the three proteins. Conserved cysteines are labeled in green.
region is surface-exposed in our BFP model (Fig. 7E). Antibodies that recognize epitopes within this region might be protective against EPEC infection and thereby select for EPEC variants that have altered epitopes.
One of the more striking findings in this study is the marked difference in secondary and tertiary structure among pilin proteins from the type IVb pilus family. Despite these differences the pilin genes have recognizable sequence similarities and the pili themselves have virtually identical structures and function. Evidently, powerful evolutionary forces contribute to diversification in pilin structure while pilus structure and function are maintained.  DECEMBER