Crystal Structure of Bacteriophage SPP1 Distal Tail Protein (gp19.1)

Siphophage SPP1 infects the Gram-positive bacterium Bacillus subtilis using its long non-contractile tail and tail-tip. Electron microscopy (EM) previously allowed a low resolution assignment of most orf products belonging to these regions. We report here the structure of the SPP1 distal tail protein (Dit, gp19.1). The combination of x-ray crystallography, EM, and light scattering established that Dit is a back-to-back dimer of hexamers. However, Dit fitting in the virion EM maps was only possible with a hexamer located between the tail-tube and the tail-tip. Structure comparison revealed high similarity between Dit and a central component of lactophage baseplates. Sequence similarity search expanded its relatedness to several phage proteins, suggesting that Dit is a docking platform for the tail adsorption apparatus in Siphoviridae infecting Gram-positive bacteria and that its architecture is a paradigm for these hub proteins. Dit structural similarity extends also to non-contractile and contractile phage tail proteins (gpVN and XkdM) as well as to components of the bacterial type 6 secretion system, supporting an evolutionary connection between all these devices.

Despite the diverse infection mechanisms displayed by Siphoviridae, using surface proteins and/or cell-wall saccharides as receptors, their tail architecture is rather conserved. It is characterized by a long non-contractile tube, assembled by stacking several tens of homo-hexameric major tail protein (MTP) 2 rings, and a central core formed by a few copies of the tape measure protein (TMP) extending between both tail extremities and determining its length. At the proximal tail end (near the capsid), the homo-hexameric terminator that stops tube elongation during assembly is found, whereas the distal tail end (opposite to the capsid) is characterized by the presence of the tail adsorption apparatus. In phages of Gram-positive bacteria this structure is composed of the distal tail protein (Dit), the tail fiber, and eventually ancillary proteins, all forming the baseplate (4). Although three-dimensional structures of siphophages MTP (5) and tail terminator (6) have been reported, no high-resolution structure is available for components of the adsorption device with the exception of our recently reported phage p2 baseplate structure (7). Several tail components of the Bacillus subtilis phage SPP1 show significant sequence similarity to equivalent proteins from lactococcal phages Tuc2009 and TP901-1. This is the case for the TMP (gp18), Dit (gp19.1), and tail-spike (gp21). However, these two lactococcal phages possess a large and bulky baseplate of 1 to 2 MDa anchored at the end of the tail-tube (3,4,(7)(8)(9), whereas SPP1 displays only an elongated tail-tip (10). Therefore, despite conserved tail architecture among these viruses, the host-adsorption apparatus distinguishes them.
The precise location of the different SPP1 tail and tail-tip proteins as well as their structures and interactions remain poorly understood. In this contribution, we report the SPP1 Dit purification, its oligomeric state, and size determination using light scattering/refractometry. We also determined the Dit structure by x-ray crystallography and electron microscopy (EM). Dit docking into EM maps of the SPP1 virion (10) allowed reassigning its position in the tail tube end "cap" density. Based on structure and sequence comparisons between several Grampositive infecting Siphoviridae, we suggest that the presence of Dit-like structures is a conserved feature in such phages, acting as a hub between the tail and the tail-tip/baseplate. Finally, a striking structural similarity was found between SPP1 Dit and Siphoviridae/Myoviridae tail components as well as with components of the bacterial type 6 secretion system (T6SS) providing additional evidence for a common origin between phage tails and T6SS.

EXPERIMENTAL PROCEDURES
gp19.1 Cloning, Expression, and Purification-orf19.1 of B. subtilis phage SPP1 (NC_004166.2) was cloned into the pETG-20A expression vector according to standard Gateway TM protocols. The final construct encoded a N-terminal thioredoxin fusion followed by a hexahistidine tag and a TEV protease cleavage site before the gene of interest. The resulting plasmid was used to transform the E. coli T7 Express I q pLysS strain (New England Biolabs). Cells were grown in Terrific Broth at 37°C until the optical density reached 0.5 and protein expression was induced with 0.5 mM isopropyl ␤-thiogalactoside overnight at 25°C. Selenomethionine-labeled protein was prepared following standard procedures in M9 minimal medium through blocking of the methionine biosynthesis pathway and with an induction temperature of 25°C (11). Purification was performed following procedures as described elsewhere (9,12). Briefly, after cell harvesting, lysis was achieved by adding 0.25 mg/ml of lysozyme, followed by a freezing/thawing cycle and sonication. Soluble proteins were separated from inclusion bodies and cell debris by a 30-min centrifugation step at 20,000 ϫ g. We used an Ä KTA FPLC system to achieve four steps: a Ni 2ϩ -affinity chromatography (5-ml HisTrap, GE Healthcare) with a step gradient of 250 mM imidazole, an overnight TEV protease digestion at 4°C using a 1:10 (w/w) protease:target ratio, a second Ni 2ϩ -affinity step (to remove the Histagged TEV and digested peptides), and a preparative Superdex 200 HR 26/60 gel filtration run in 10 mM HEPES pH 7.5, 150 mM NaCl. Orf19.1 was then concentrated up to 6 -8 mg/ml. Molar Mass and Hydrodynamic Radius Determination by SEC/MALS/UV/QELS/RI-Size exclusion chromatography was carried out on an Alliance 2695 HPLC system (Waters) using a KW804 column (Shodex) run in 10 mM HEPES pH 7.5, 150 mM NaCl at 0.5 ml/min. Multiangle static light scattering (MALS), UV spectrophotometry, quasielastic light scattering (QELS), and refractometry (RI) were achieved with a MiniDawn Treos (Wyatt Technology), a Photo Diode Array 2996 (Waters), a DynaPro (Wyatt Technology), and an Optilab rEX (Wyatt Technology), respectively, as described (8,13,14). Mass and hydrodynamic radius calculation was done with ASTRA V software (Wyatt Technology) using a dn/dc value of 0.185 ml/g.
Crystallization and Structure Determination-Crystallization of selenomethionine and native gp19.1 was performed at 20°C in 96-well Greiner crystallization plates using a nanodrop-dispensing robot (Cartesian Inc.). Crystals grew in a few days by mixing 300 nl of protein at 6 -8 mg/ml with 100 nl of 2 M NaCl, 0.1 M Na ϩ acetate, pH 4.0. Crystals were cryoprotected with mother liquor supplemented with 26.5% glycerol and flash frozen in liquid nitrogen. One SAD dataset was collected at the selenium K-edge ( ϭ 0.97818 Å) from a single crystal at the BM14 beamline (European Synchrotron Radiation Facility, Grenoble, France) using a MarCCD detector ( Table 1). The native data set was collected at the PROXIMA 1 beamline (SOLEIL, Gif-sur-Yvette, France) using an ADSC Q315r detector (Table 1). Data were processed and scaled using XDS (15) POINTLESS and SCALA (16). The phenix.autosol and phenix.autobuild wizards were used to solve the structure (11 out of 15 selenium sites were found) and perform initial model building (17)(18)(19). Manual model building was done with COOT (20) and TURBO-FRODO (21). The native Dit structure was solved by molecular replacement using MolRep (22) with one monomer as search model. Refinement was performed at 2.95 Å using Buster-TNT (23). At various stages of the refinement process, molecular replacement-SAD and averaged kick maps were calculated to help modeling with Phaser (24), Sharp (25), and Phenix (26). Structure analysis was assisted by the PISA server (27) and promotif 2 (28). Electrostatic potential calculation was performed with PDB 2pqr (29) and APBS (30). Fitting of the Dit x-ray structure into EM densities was performed with Chimera (31), VEDA (32), and the COLORES and COLACOR modules of the Situs package (33). All figures were generated with Chimera.
Dit On-grid Negative Staining Sample Preparation, Image Processing, and Three-dimensional Reconstruction-3 l of freshly prepared protein at ϳ10 g/ml was applied on a 400mesh glow-discharged carbon-coated copper grid. Excess Dit solution was blotted and 4 l of 1% uranyl acetate was applied twice on the grid and incubated for 1 min. Grids were then dried and kept in a desiccator cabinet until use. The grids were observed under low-dose conditions with a Jeol 2200FS transmission electron microscope operating at 200 kV. Images were recorded at ϫ50,000 magnification and digitized using a Nikon coolscan 9000 ED with a step size of 10 m. Individual particles (10,875) were extracted semiautomatically from 8 micrographs with Boxer (34) and corrected for the phase-contrast transfer function by phase-flipping. Image processing and single-particle three-dimensional reconstruction were performed using IMAGIC-5 (35) according to the classic single-particle reconstruction approach. Briefly, an initial model was computed using a C1 start-up procedure, included in the ANGular-RE-Constitution program, followed by three iterative cycles of image alignment, class averaging, and three-dimensional reconstruction. The resolution of the final reconstruction including 9,027 particles was estimated at 21.5 Å using the Fourier shell correlation criterion with a cutting level of 0.5 (36). The final density map was then filtered at 21-Å resolution.

Dit Production and Light Scattering Characterization-SPP1
Dit was overproduced and purified to homogeneity yielding 60 mg of purified protein per liter of culture. We used SEC/MALS/ QELS/RI to characterize the Dit oligomeric state and size (8,13,14). The Dit measured mass and hydrodynamic radius were 325,629 Ϯ 500 Da (supplemental Fig. S1) and 6.7 Ϯ 0.4 nm, respectively, establishing the dodecameric nature of Dit in solution (theoretical monomer mass: 28,489 Da).
Dit Crystal Structure Description-The asymmetric unit of Dit crystals contains three monomers. The final model was refined at 2.95-Å resolution resulting in R work and R free values of 20.0 and 23.6%, respectively (Table 1). Applying crystallo-graphic symmetry operators results in the completion of two hexameric rings stacked back-to-back with 12 protruding domains located at the periphery ( Fig. 1, A-C). The two rings are rotated relative to each other by 10°. The crystal packing is formed by piles of dodecamers aligned along their channel axis (unit cell a axis) and contacting each other through the tip of the 2 ϫ 6 protruding domains on both faces. Each pile laterally contacts 6 neighboring piles, shifted by translation of one hexamer, via the protruding domain sides.
The dodecamer central core is a 75-Å high and 80-Å wide hollow cylinder made of a double ring (excluding the protruding domains). Taking into account the 12 protruding domains, radiating from the core rings, results in Dit overall dimensions of 105 Å (height) and 140 Å (diameter). The circular-shaped core harbors a ϳ40-Å wide central channel whose inner surface is mainly covered by acidic residues and displays hence a strong negative electrostatic potential (Fig. 1, A, C, and D). This feature facilitates DNA traffic into the host cytoplasm during infection and prevents interactions with the channel wall. A similar negatively charged conduit is observed in the SPP1 head-to-tail connector channel (37) as well as in the phage tail-terminator channel (6). The interface buried surface area between two neighboring monomers within a Dit hexameric ring is ϳ1,300 Å 2 per monomer (ϳ10% of the monomer surface). A total sur-  face of ϳ16,000 Å 2 thus ensures the cohesion of one hexamer. The interaction between the two hexameric rings of a dodecamer is mediated by a smaller buried surface area (ϳ5700 Å 2 ). Each monomer was built in the electron density between residues 9 and 253, the C-terminal amino acid. The N terminus is situated at the periphery of the cylinder. Some blobs of extra density account for the presence of the first 8 residues but their discontinuity precludes further modeling. In contrast, the C terminus is well ordered due to interactions of its negatively charged carboxyl-terminal group with the Arg 60 guanidinium moiety from the same monomer and the Asn 85 amide nitrogen from a neighboring monomer.
Each Dit monomer can be divided in two domains corresponding to the N-and C-terminal parts of the polypeptide chain (Fig. 1, E-H). The N-domain (residues 9 to 135) is well defined in the electron density map and is composed of two layers comprising 8 ␤-strands organized in two ␤-sheets, a ␤-hairpin, and an ␣-helix (Fig. 1, E-G). The layer forming the wall of the central channel is folded as a four-stranded antiparallel ␤-sheet, constituted by strands 2, 5, 8, and 7b, from which extends a ␤-hairpin, termed the belt, encompassing strands 3 and 4 (residues 32 to 55). The belt projects toward the N-domain of a neighboring monomer and interacts with its fourstranded ␤-sheet ensuring the strength of the hexamer association (Fig. 1C). The outward layer forms the external wall of the cylinder and contains an ␣-helix (h1) and a three-stranded antiparallel ␤-sheet made of strands 1, 6, and 7a (Fig. 1, E-G).
The Dit C-domain (residues 136 to 253) protrudes out of the cylinder core and mediates all the crystal contacts. The C-domain folds as a nine-stranded ␤-sandwich organized in two ␤-sheets and an additional extended stretch (Fig. 1, E, F, and H). The domain starts at residue 136 with ␤-strand 1, belonging to a five-stranded antiparallel ␤-sheet that includes also strands 9, 3, 6, and 7. Strand 2 initiates a four-stranded antiparallel ␤-sheet formed with strands 8, 4, and 5. An elongated segment (7b) links strands 7 and 8. Dit EM Single-particle Reconstruction-A preliminary EM reconstruction, from negatively stained Dit, resulted in a convincing dodecameric structure constituted by two identical hexameric rings in the absence of any enforced symmetry. As the single particle three-dimensional reconstruction converged to a structure with a clear 6-fold dihedral symmetry (D6), we imposed this symmetry to improve its quality. The resulting structure is a thick-wall hollow cylinder delimiting a central channel. Six identical globular domains protrude regularly outwards from each of the two back-to-back hexameric rings (Fig.  2). The 6-fold symmetry operator coincides with the channel axis. The overall dimensions and shape of our Dit EM reconstruction (at 21-Å resolution) are in good agreement with those observed in the crystal structure. Indeed, correlation coefficient values of 82.3 and 69.2% were obtained in fitting the dodecameric Dit structure into the EM map with COLACOR and VEDA, respectively, revealing a similar architecture for the isolated molecule. Noteworthy, the poor quality of the map in the protruding domain regions is likely due to their flexibility.
Structural Similarity of Dit with Phage p2 ORF15-A DALI search (38) with the Dit monomer resulted in a striking structural similarity with ORF15 (PDB code 2WZP, Z ϭ 13.6, Fig. 3,   A and B), a component of the phage p2 baseplate (7,8). ORF15 is composed of a N-terminal domain (residues 1-120), whose hexamer forms a channel, with two layers of ␤-sheets and an ␣-helix associated with the external one. The ORF15 hexamer is very similar to the SPP1 Dit one (Fig. 3, C and D) and is also maintained by six belt extensions. The ORF15 C-terminal domain (121-298) is a ␤-sandwich resembling remotely galectin, as is the case of the SPP1 Dit C-domain (Z ϭ 5.9 with PDB code 2YV8). Phage p2 ORF15 galectin domain exhibits, however, a long insertion of ϳ60 residues, termed the arm, attaching ORF18 (the receptor-binding protein, RBP) to the central core of the baseplate. This feature is absent in SPP1 Dit because it does not serve to assemble peripheral elements of a baseplate.
Dit Localization in the SPP1 Virion-Reconstructions of the SPP1 tail, cap and tail-tip, before and after DNA ejection, were previously obtained using contrast EM (10). The tail-tube was observed to be formed by stacked hexameric MTP rings and to contact the density volume termed cap at its distal extremity. It was proposed that the cap is formed by the TMP C-terminal domain. Progressing toward the tail-tip, a globular density, located between the cap and the terminal density, was assigned to a putative gp19.1 trimer. However, neither the Dit structures (x-ray or EM) nor its mass corroborate this assignment. Fitting the Dit dodecamer into the cap density envelope provides a significantly better match for one hexamer (located in the middle part of the cap density), whereas most of the second one is outside the EM density (Fig. 4, A and B). A computational fitting of one Dit hexameric ring in the cap before DNA ejection, using COLORES, results in positioning it either into the last ring assigned to MTP and the upper cap density (Fig. 4, C and  D) or in the middle part of the cap with an opposite orientation (as the proximal ring in the dodecameric fitting case, Fig. 4G). In the cap after DNA ejection, only the former solution was obtained (Fig. 4, E and F). We decided to perform thorough fitting experiments, using COLACOR and VEDA, to help assign the Dit orientation. Our results did not allow to unambigously distinguish which of the two orientations is correct because in both cases the Dit central ring and the six protruding domains are accommodated into the cap density before and after DNA ejection (Fig. 4, C-H, and supplemental Table S1). We thus believe that in the SPP1 virion, Dit might consist of a single hexamer positioned in the cap region in either of the two FIGURE 2. Single particle EM reconstruction from negatively stained Dit. A, the x-ray structure of the Dit dodecamer was fitted into the Dit EM reconstruction using Chimera, VEDA, and the COLACOR module of the Situs package. Fitting resolution was 21 Å. B, view rotated 90°relative to A.
proposed orientations. We thus suspect that the observation of a dodecameric Dit in solution and in the crystal is an artifact that might result from the absence of its partners when expressed alone in E. coli.
The above Dit assignment is in agreement with the observation of a ϳ40-Å wide channel at the center of the protein, which matches the inner diameter of the tail-tube in the virion EM map after DNA ejection (42 Å) (10). We also observed that the electrostatic character of the continuous channel forming the DNA ejection pathway is conserved: most components (headto-tail connector, tail terminator, and Dit) display a negatively charged surface in the tunnel. We expect that this property   Table S1. C-F views illustrate the "Cdown" conformation, and G and H the Cup conformation (supplemental Table S1). A, fiting of the Dit dodecamer into the cap before DNA ejection. B, fiting of the Dit dodecamer into the cap after DNA ejection. C, fitting of the Dit hexamer into the upper part of the cap density before DNA ejection. D, 90°view relative to C. E, fitting of the Dit hexamer into the upper part of the cap density after DNA ejection. F, 90°view relative to E. G, fitting of the Dit hexamer into the middle part of the cap density before DNA ejection. H, fitting of the Dit hexamer into the middle part of the cap density after DNA ejection.
could also be observed in the tail-tube interior because the MTP of phage SPP1 has an acidic pI.
Finally, we demonstrated that Dit remains associated to the tail after SPP1 DNA ejection that leads to loss of the tail-spike (10). Virions incubated with YueB780 (the protein receptor on B. subtilis) to promote DNA ejection or untreated were applied to a sucrose gradient and ultracentrifugated. Dit co-sedimented with phages to the bottom of the gradient thus confirming that it is still bound to the tail-tube after ejection (supplemental Fig. S2).

Dit, a Hub in Siphoviridae Infecting Gram-positive Bacteria-
The TMP C-terminal moiety, Dit, and the gp21 N-terminal part of SPP1, TP901-1, and Tuc2009 (4) exhibit sequence identities of 24, 32, and 26%, respectively (10). It can therefore be expected that such sequence similarities, when translated at the structural level, should lead to similar tail assembly mechanisms. Genomic analyses identified numerous Dit-like proteins in phages where the tail-spike N-terminal moiety and, to a lesser extent, the TMP C-terminal region exhibit identity to the SPP1 equivalent proteins (supplemental Fig. S3). The conserved order of the TMP, Dit, and tail-spike coding genes in these phage genomes further argues for a conserved strategy of tail assembly initiation in Siphoviridae involving a cross-talk between the three proteins. Dit is thus a widespread maestro that orchestrates tail and adsorption apparatus assembly acting as a hub where the tail-tube, the tail-spike, and eventually the baseplate anchor. The convergence of all observations made on Bacillus and Lactococcus infecting phages strongly suggests that its role is conserved among numerous distant phages. We propose that Dit is a structural paradigm in siphophages infecting Gram-positive bacteria and possibly also in the Gram-negative world (see below).
Structural Similarity between SPP1 Dit and Phage Tail-tube Proteins-The DALI search also revealed a marked structural similarity between SPP1 Dit and tail proteins belonging to noncontractile and contractile phages. The Dit N-domain is reminiscent of the N-terminal part of the phage major tail protein gpV N constituting the tail-tube of this Gram-negative infecting Siphoviridae (PDB code 2Q4K, Z ϭ 5.6, supplemental Fig. S4A). A structural resemblance can also be established between Dit and the XkdM tail-tube protein of PBSX, a B. subtilis prophage known to form Myoviridae particles when induced (PDB code 2GUJ, Z ϭ 2.4, supplemental Fig. S4B). All three protein domains share a similar fold made of two antiparallel ␤-sheets arranged in two layers plus an ␣-helix. This finding reinforces and extends the previously proposed hypothesis that contractile and non-contractile phage tails may have an evolutionary connection (6,7). Considering the high structural conservation observed among proteins involved in the assembly of different parts of the tail as well as their belonging to distant phage families (Siphoviridae and Myoviridae), we suggest that most phage tail components derived from an unique ancestral protein module, which has evolved to provide all these different proteins.
Evolutionary Connection between SPP1 Dit and the T6SS-SPP1 Dit is also structurally similar to EVPC (Z ϭ 5.4), Hcp1 (Z ϭ 5.6) (39), and Hcp3 (Z ϭ 5.8), three protein components from the Gram-negative Edwardsiella tarda and Pseudomonas aeruginosa T6SS. The EVPC (PDB code 3EAA), Hcp1 (PDB code 1Y12), and Hcp3 (PDB code 3HE1) overall fold is very close to the Dit N-domain fold, i.e. two antiparallel ␤-sheets plus an additional ␤-hairpin extension (not modeled in Hcp3). The three structures superimpose well onto SPP1 Dit N-domain as well as between each other (supplemental Fig. S5 and Table S2). EVPC, Hcp1, and Hcp3 crystal structures show that these proteins also form hexameric rings with overall dimensions matching those of the SPP1 Dit N-domain hexamer and harboring a 40-Å wide central channel. The ␤-hairpin extension role is conserved in all cases to maintain hexamer cohesion. EVPC is most probably homologous to Hcp1/Hcp3 as suggested by their high amino acid sequence identities (32/21%), similarities (49/40%), and their structural resemblance (supplemental Table S2). These results constitute additional evidence for an evolutionary relationship between components of the T6SS and phage tails as was previously proposed based on the observed structural similarity between the phage gpV N and P. aeruginosa Hcp1 (6,39) and between the phage T4 gp27 and E. coli VgrG (40).
The Dit evolutionary trajectory in phages with and without baseplate is possibly driven by a selective pressure imposed by their distinct interaction modes with host cells. Virions like SPP1 might have originated from a phage with a baseplate that acquired the capacity to recognize a proteinaceous receptor and thus lost the necessity to have a baseplate with multiple RBPs. Loss of this structure and its Dit docking structural motif (e.g. the galectin domain arm of the phage p2 ORF15 (Fig. 3D)) would follow. The Dit galectin domain could subsequently acquire (or already had) the capacity to bind cell-wall saccharides providing adhesion to cell surfaces. Inversely, the Dit galectin domain from a SPP1-like phage could also evolve toward complex RBP-accommodating baseplates followed by loss of the tail-spike ability to bind a proteinaceous receptor. The galectin-like domain would then only play a structural role, the irreversible and specific recognition of saccharides being performed by RBPs. Both evolutionary scenarios imply two events. First, acquisition of a new receptor binding activity generates a phage carrying a baseplate (saccharidic receptor binding) or a tail-spike (proteinaceous receptor binding). Second, loss of one of the receptor activities. These events leading to switches in receptor and host specificities likely involve gene exchange within the genetic pool of phages infecting Grampositive bacteria, in particular when they share a common host like the lactophages. Interestingly, some Dit homologous proteins have large extensions at their C terminus that might provide additional functions to the hub protrusions (supplemental Fig. S3). Although we cannot trace which of these mechanism(s) shaped the SPP1 and lactococcal phages host adsorption apparatus history, it is clear they are related by close evolutionary links, maybe through another lactococcal phage such as c2.